Genetic Psychology 

MONOGRAPHS 

CliUd Behavidr, Animal Behavior, 
and Comparative Psychology 

KDITBD By 

CARL MURCHISON 


John E. Andhriion 

Univeraity of MfniieAdia 
CHABiOTTH BOaLBK 
UnivcraiiHU WIcn 

William H. Burnham 

Clmk Umvershy 

CvRiL Burt 

Unlvorfliiy of London 
Leonard Carmichael 

nroti<n Unlvormlly 
Ed, CLAPARtoB 

UnLvera1i(& dc Gtnivv 
Edmund S. Conklin 
U nivcmUy of Oregon 
Santjb Db Sanctis 

R, UnlvcrMlAi di Roma 

Arnold Gesiill. 

Yrt)e UniversUy 
WiLLiAM HhALY 

Judge Bnkci' Foundntion, 
Uo»ron 

Lr^a S- Hclcinoworth 
Tcocliers College, 

Coluinbin Unive rally 

Waltcr S. Hunter 

Clark , Universiiy 

Buford Johnson 

The Johns llapkma Universily 

Harold E. Jones 

Univcraiiy of California 

Truman L. Khlli-y 
H arvard Universily 


YoRirilllDB Kubo 
MiroBhlmti Normal College 

K. S. Lashlev 

Univeraky of Chicago 

A. R, Luria 

Akiidcmlya Kommunliiteheakega 
Voaplianlyn im. N. K, 
Krnpakoi, Moakva 

Toshio Nooami 

Kyoto Imperial Unlveriliy 

Ivan P, Pavlov 
GoBudaraivcnnli Inatkut 

Ekanerlmcnialnol McdIuInL_ _ 
Leningrad 

Hhnri PiiaoN 
Uni wraith dc Parji 
William Stern 

Ilomluirgiache Untyeriltfli ' 

Calvin P, Stone 
Stanford Univeraity 

Lewis M, Tbrman 

Sinnford Univeriliy 

Godfrey Thomson 

University of Edinburgh 

E. L, Thorndike 

Tcncliera College, 

C'ulumbia University 

C. J. Warden 

CnUimhin Univeraky 

John 13, Watson 
N ew York Cif 3 

JIelcn Thompson WOollk'i 

Tenelicra College, 

Culiiiiilnn Univeraky 


Luheuta M. Harden, Pii.U. 

Assistant Editor 

Volume XI 

19 3 2 


Issued monihly by the 
CLARK UNIVERSITY PRESS 
Woiceslcr, Mas*iac;hii.sciiN 
U. S. A. 






$7.00 per volume MONTHLY January, 1932 

Single number! $2.00 Two volumes per year Volume XI, No, I 

Genetic Psychology 

MONOGRAPHS 

Child Bohnvlor, Animal Behavior, 
and Comparative Psycho logy 

GENERAL FACTORS IN TRANSFER OF 
TRAINING IN THE WHITE RAT® 

From the Animal Laboratory of the Department of Psychology t 
Columbia University 

Theodore A, Jackson 


*Accpputl for puhlicRttnit liy C. J. of the Ktliion'ul 

and received in die Kdilorial OlTice, ATay 11, 


Worcehtcr, Mni^inrlniAeKs 
Cupyrighc, 1033, by Clark Ihiiveriiiiy 
Eniered fiB Bccoml-olaiB mnUer December I, 1925, m the pcjai-oflice at 
Worcesier, Mnaii., under Aci of Mnrcli 3, 1879. 


ti] 


kLi'Aj. 




ACKNOWLEDGMENTS 

The writer wishes to express his deep appreeiatum 
to Professor C. J. Warden of the Department of I’sy- 
chology, Columbia University, for supervising the 
experiment throughout, and for much kindly criticism 
in the preparation of the manuscript. Thanks are also 
due to Mr. Richard Fitch for many helpful suggestions 
in the statistical treatment of the results. 

TiiI'OUouk a. Jackwjn’ 

Columbia Univershv 
New York City 


[ 3 ] 



CONTENTvS 

AcKNO\Vl.ttLK;MENTS . 3 

I. iNTROm/CTION 7 

II. Exi’hrimbnt I 19 

III. Exi’ltRIStBNT II 39 

IV. Discussion and Su.m.marv 49 

Rspbruncks ."ir) 

Rhsumk bn raAN\‘Ais 58 

Rbpiirat auf dhutsch ...... 59 


[51 
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INTRODUCTION 

The presence of transfer of training was noted in the 
early studies of motor habits in animals, although for 
the most part its report was incidental to tlic main pur- 
pose of the experiment. Thorndike (18) pointed out 
that previous experience enables a cat to form associ- 
ations more quickly; for exainplc» after solving six or 
eight problem boxes by different methods, the cat de- 
velops a tendency to claw at loose objects and does not 
try to squeeze through holes or to bite at bars. Watson 
(25) observed that trained rats, those that had been 
tested on several pn)blcm8, learn faster than untrained 
ones. Yerkes (30) also noted, in his work on the danc* 
ing mouse, that animals learned a maze more (juickly if 
they had previously run other mazes. Richardson 
(17) found a distinct difference bclAvccn the learning 
of trained and untrained rats, the latter requiring more 
trials to learn a problem than the former. One of the 
first investigations of transfer of training in animals 
was that of Bogardus and Henke ( 1 ) . They put while 
rats through a scries of mazes, each maze being only 
slightly different from the preceding one. Transfer 
was positive in all eases and it was greater when tlie 
alteration of pattern was made in the first part of the 
maze. Hunter (8) compared the learning of un- 
trained pigeons with that of pigeons which had been 
previously trained in another maze. There was little 
or no difference between tlic two groups in the number 
of trials to learn the maze, although tlic learning curve 
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for the trained group showed an earlier drop. That 
interference may exist is shown by the experiments on 
squirrels by Yoakum (32), who found that animals 
which had learned two problem boxes did not learn a 
third problem box as readily as animals which were 
put directly on the third problem. The response re- 
quired for the first two problems was scraping, while 
for the third it was butting the nose against a latch. 
Cole (3) has also observed interference in the raccoon 
when the latch on a problem box was changed in 
position. Similar observations of interference were 
made on the monkey by Kinnaman (II) and others. 

Somewhat later than the above studies arc those of 
Dashiell (4-, 5), Hunter (9), and Ho (7). They all 
used the white rat and found large positive transfer of 
training, Dashiell (4), in his earlier study, found 
that general adaptation to experimental conditions aids 
the fat in learning a maze, and in a recent study (5) he 
shows that learning a maze is in part the acquisition of 
a general orientation toward the food box. Hunter (9) 
has shown that even between mazes of opposite patterns 
, transfer is positive, and he agrees vrith Webb in finding 
the locus of transfer in the first pact of the learning 
curve for the second maze. An experiment by Ho (7) 
to determine the transfer effect of varying degrees of 
integration of the first problem gnve inconclusive re- 
sults, although there was some indication that the 
second habit was more readily acquired the greater the 
degree of integration of the first habit, Yerkes and 
Coburn (31) observed that a pig gives evidence of 
beneficial effects from one problem to another in the 
multiple-choice type of problem box, 
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The first systematit; work with animals on transfer of 
training is that of Webb (26) , He ran groups of white 
rats through different pairs of ma/.cs. In order to de- 
termine the extent to wlucli transfer depends on the 
second maze, lie ran five groups of animals through the 
same first maze (Maze A) until they had reached tlic 
norm of four out of five errorless trials^, then one 
group was transferred to Maze B, another to Maze C, 
another to D, another to E, and the last to F. To 
determine the dependence of transfer on the first maze, 
the second maze was constant while the first maze 
varied, as B-A, C*A, D-A, E-A, F'-A. In our Table 
1 the percentage of savings for the various pairs of 
mazes is indicated, 

The author concluded that (1) the nature of transfer 
is positive, that is, the learning of one maze has bene- 
ficial eflccts in the mastery of a subsequent maze; (2) 
that transfer is a composite process consisting of both 
positive and negative elements, and the total result is 
determined by the predominance of one or the other 
of these elements; (3) the degree of transfer is de- 
pendent on at least four factors: (n) when the first 
maze is constant the degree of transfer varies with the 
difficulty of the second maze, (A) when the second 
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maze is constant the degree of transfer varies with the 
difficulty of the first maze, (c) the degree of transfer 
varies with the degree of similarity between the two 
mazes, and {d) the degree of transfer was determined 
in part by the direction of transfer, that is, if the mazes 
were not similar there was a greater transfer effect 
from one maze to the other than was the case when 
they were learned in the opposite order; (4) the locus 
of transfer, on the average, was confined to the firs^l 
five trials, that is, the animals saved the equivalent of 
the first five trials on the second maze; (S) transfer 
produced a selective effect on the type of error made — 
fewer retracing errors were made in the second maze 
than in the first. 

The work of Wiltbank (28) is essentially an exten- 
sion of that of Webb. The investigation was carried 
on in the same laboratory and under the same general 
conditions. The main purpose of his study was to 
determine whether or not transfer of training was 
cumulative through a series of mazes. Five different 
groups of white rats were used, and the order of mazes 
was rotated in such a manner that each group started 
with a different maze. His results showed no cumu- 
lative effect. 

Another problem that he attacked was that concern- 
ing the transfer effect of different degrees of partial 
learning of the first maze on the learning of the second. 
In one case the initial partial learning was on a less 
difficult maze, in another it was on a more di/licult 
maze. Table 2 gives the percentages of transfer for 
the different degrees of partial learning. 
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TABLE 2 

From the Less Difficult to tub More Difficult Maze 
(Ma/.r E to Ma?.r D) 



Arier 31 
kriaU 
in E 

AHcr 4 
iriaU 
tn K 

Afier t 
iriaU 
in E 

After 16 
irlab 
in E 

After 
ctjmplirlfl 
tnantcry 
of E 

Trials 

Errors 

Time 

— 3.73 
—10.60 
75.U 

— 1S.2A 
— 2,f»S 
78.76 

~ t.8l 
•“19.84 
79.62 

34.63 

43.50 

72.82 

15.44 

44.97 

40.13 

From 

TUB More Difficult to tub Lk.*w 
(Maze D to Maze. E) 

Difficult Maze 


Af«ir 3 

irialtr 
in 1) 

Afwr 4 
iriali 
in U 

After % 
irialft 
in ]1 

After 16 
iriah 
in 1) 

After 
complete 
maMcry 
of n 

TfialB 

Rrrom 

Time 

— «.Sl 
— 'IR.S5 
7J,I1 

—11.72 

— 2.17 

— 1.M3 

7Ul 

47,10 

51.01 

7146 

69.22 

77.69 

65.11 


This part of the study was extended so that cacli of 
the above groups, after having completely learned the 
second ma^.c, was then changed buck to complete the 
learning of the first maze. Tn this ease the results 
showed positive transfer in nearly all eases. The at- 
tempt was also made to determine whether transfer 
effects were greater from more dillicult or less dilllcult 
mazes. The results showed slightly greater transfer 
where the more dinicult maze came first. In still an- 
other phase of the study an iilentical part of one maze 
was inserted in the second maze; however, the results 
showed no greater saving of errors in the identical part. 

In the summary of Wilibank's results, the following 
points arc made: (I) that transfer heiween two mazes 
of the general type useil is predominately positive, 
(2) that transfer through a scries of mazes is persistent 


12 OENBTIC PSYCHOLOOV MONOOlWPHS 

although not cumulative, (3) that, when two adjacent 
maaes have identical parts, the more expeditious learn- 
ing of the second was not due to a saving of errors in 
the identical part, (4) that average savings arc higher 
from more difficult mazes to less difficult than the oppo- 
site, (5) that the transfer effect between two mazes 
when the first is only partially learned was not positive 
until the partial learning was 16 trials, (6) that, when 
transfer was made from a maze completely learned to 
one already partly learned, the later maze was learned 
with a saving in trials when the partial learning had 
been 2 trials, 4 trials, 8 trials, but transfer was negative 
for 16 trials. 

The foregoing survey of literature on transfer of 
training in the motor capacities of animals may be 
summarized as follows: 

1. Transfer is predominately positive in mazes of 
average difficulty, 

2. Transfer is not cumulative through a scries of 
rhazes. 

3. The locus of transfer appears to be in the first 
part of -the learning curve of the second problem. 

4. General orientation toward the food box is an 
important part of maze learning, 

5. The similarity between two mazes, as well as 
their relative difficulty, are important factors in deter- 
mining the amount of transfer. 

6.i, When two adjacent mazes have identical parts, 
the more expeditious learning of the second maze does 
ndt seem to be due to a saving of errors in the identical 
part. 
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7, The degree of transfer varies to some extent with 
the amount of partial learning on the first problem. 

It will be noted that there has been little or no at- 
tempt to make an experimental analysis of general fac- 
tors in transfer of training in animal maze learning. 
The main purpose of the present study is to deal with 
certain aspects of this general problem. The fact that 
transfer has been found to be located principally in the 
first part of the learning curve of the second problem 
suggests that it may be due to emotional factors rather 
than to knowledge elements. That transfer eflfccls are 
not cumulative through a series of mazes also carries 
the same suggestion. The fact that transfer effects are 
greater when the amount of partial learning is greater 
is also consistent with the idea that transfer may take 
place through emotional factors rather than knowledge 
factors. If transfer is produced in such fashion, then 
initial practice on almost any type of apparatus should 
result in some degree of positive transfer to the maze. 
To test this possibility is the purpose of Experiment I 
of the present study. 

Definite conclusions have not as yet been reached 
concerning the transfer effects from varying degrees 
of partial learning of the first problem. However, if 
that initial practice was carried beyond complete learn- 
ing over into varying degrees of overlearning, the in- 
fluence of different amounts of initial practice might be 
made clearer. It is the aim of Experiment 11 to deal 
with this aspect of transfer and, if possible, to relate 
the results with the previous findings on initial partial 
learning. 
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In making the present experimental analysis of trans- 
fer of training in the white rat the maze was chosen as 
the principal apparatus to be used. The reasons for 
this choice were, first, the maze is a convenient appa- 
ratus to use with rats as subjects; secondly, the particu- 
lar maze used, the Warner-Warden design (24) , allows 
of systematic variations in pattern; and, thirdly, be- 
cause the maze, if not too simple in pattern, is a 
sufficiently reliable test or measuring device for group 
differences. Two different patterns, Maze X and 
Maze Y, as shown in Figures 1 and 2, were used in this 
investigation. Both patterns contain the same number 
of blind alleys, and the same number of pathway units. 
The pathway of Maze X is a "simple zigzag," wlicreas 
that of Maze Y is somewhat more complicated and may 
be designated as a "double zigzag" pattern. 



Pat-tbrn op Maze X 

E refers to entrance; ^ to the foot! box; D to door«: 

and 6 to the various blind alleys. 
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Patters of Mafe Y 

R rcfcrji to ciurancc; I'li lo the food Iwx: I) to door*; 

J. 2.J, 4, 5, nnd 6 to the vnrimiK blind allcjt. 

While a particular pattern was being used, it was 
placed on a large bVi- by H-loot table which was 
covered witli battleship linoleum to provide a seamless 
floor for the maze. The small room, in which the 
maze is the only apparatus, was illuminated by a single 
lamp with a large milk-glass globe which gave tiuiic a 
diffuse light and was directly above the maze. The 
only window in the room was darkened by means of a 
casecl-in black window shade, atui the doors to the maze 
room were kept closed during experiincntaiion. Tn 
certain of the various transfer situations the folhtwing 
apparatus was useth («) revolving-wheel acliviiv cage 
(Richter type), (/«) the Jenkins-Warden moiivalion 
apparatus, and (r) a simple prohlem 1 r).\ which the 
animal operates by siefiping on a plate in the lloor. 

'I'hc animals, supplied hy the Albino Supply, Inc.. 
Philadelphia, Pa., were medium size, SO to lOO grams, 



1'6 


OBNBnC PSyCHOLOCV MONQOPIA.PI1S 


and frorti 60 to 90 days old. Only males were used in 
order to avoid any possible complications from the 
oestrous cycle in the female. These animals were 
housed in small cages, 18 by 12 by 12 incites, which 
were closed on three sides and had a solid bottom, saw- 
dust being used for bedding. Fresh water was supplied 
at all times by means of inverted siphon bottles. All 
animals were fed for one minute in the food Ikjx and 
eight minutes in the feeding cage after their daily run 
in the maze. The diet was wholewheat bread soaked 
in milk, supplemented with a weekly ration of greens. 

As soon as the animals were received in the labora- 
tory they were taken one at a time and placed in the 
living cages in random order, allowing five animals lo 
a cage. On the second day their ears were clipped for 
marking, and on the fourth day they were started on 
the training of one group or another, sec E.vpcrinicnt I. 
Thd itfiason for this three-day period between arrival of 
.the rats and the beginning of the experiment was to 
all6w them to tecover from any ill effect of shipment 
and to become adjusted to tlm new diet. A longer 
period couM not have been allowed without interfering 
with the plan of the experiment. 

The motivation used throughout was 24-hours' star- 
vation, the incentive employed in the apparatus being 
, a sample of the regular diet. One trial was given a 
day, and at the end of the trial each animal was allowed 
;to feed one minute in the food-box, compartment of the 
maze, after which it was put back in the living cage. 
All trials were given in the afternoon between 1 ;0() 
and S ,00 P.M. The following records were kept for 
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each trial: («) time scores, the time required for the 
animal to go from the entrance to the food box, {b) 
three types of error scores: "A" errors, or entering a 
blind alley when the animal was oriented toward the 
food box, “B” errors, or entering a blind alley when 
the animal was oriented toward the entrance compart- 
ment, and “C” errors, or retracing sections of the path- 
way. A blind-alley error was counted when the animal 
entered far enough so that the tips of its cars could be 
seen within. If an animal had not reached the food 
box in five minutes after it had been inserted in the 
maze it was removed from the pathway or blind alley 
and placed in the food-box compartment where it was 
allowed the regular one-minute feeding period. The 
norm of mastery used throughout the present study was 
four errorless trials out of five. The total number of 
trials to learn was the score used in making the main 
analysis of results. This measure docs not include the 
first trial, since the latter was ctinsidcrcd to be a part 
of the general preliminary. The five trials of the norm 
were omitted from the score, as is usual in maze com- 
parisons. 

The attempt was made to handle all animals as uni- 
formly as possible. 'I’lic method finally adopted was 
to grasp an animal around the body with the left hand 
in such a way that the thumb and forefinger arc at 
either side of the animal'.*: head. Using such a tech- 
nique, an animal may he carricii in the apparatus and 
inserted into the pathway with the least ilisnirbance; 
furthermore, llic right hand is free to open and dose 
doors and to operate the stop-watch. 1 hiring a trial 
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the experimenter remained very quiet, thus eliminating 
possible distractions from this source. Additional 
points of procedure will be mentioned in connection 
with the different experiments as they are discussed. 



II 


EXPKRIMKNT I 

The object of this experiment was to determine the 
extent to which transfer of training in maze learning 
of the white rat may he due to general factors, such as 
may be involved in other than maze situations. In 
order to make such a determination, several groups of 
animals were tested tm the same maze after having pre- 
viously had dilTcrcni kinds of activity. The various 
transfer situations used were graduated from very gen- 
eral types of adjustment, such as general laboratory ex- 
perience, to training on difTerent kinds of apparatus. If 
such general factc»rs are significant in maze learning, it 
is imp«)rtant ti» find out just how much of such activity 
is necessary to bring ahoui a reliable ililTercncc on the 
maze test, and how much transfer can he brought about 
by these factors. The first five groups, as slutwn in 
Table 3, were given various kitiils of general activity 
previous to the maze test. (Iroiip I hatl a t)iinitniini ol 
such activity, and e.uh succeeding gniup w.ts given 
more and more. 'I’he animals in each of these groups 
were placed singly in new situations, where they were 
allrnved to c.vpiore and become atljusicil. Xo fo'id wns! 
given, anil no particular kind of response was rci|uirt:d 
of them. They were merely given opporiuiiitv to get 
used to a new situation. 'I'he last four groujis of Talili 
3 were trained •»n nric or more kinds nf aitparatus. be- 
ing rewarded with food after having made a paiiicidai 
response. 'I’he groups arc arranged in the tal))c in a 

|I0| 
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TABLE 3 

SiiowjNo Groups U«bp 


Oroup 

rvumbflc 

Number of 
Animals 

Aciivuy prfivitwii 
!<} maiitc 

Maoo 

iFfll 

1 

20 

None (lurted dlntcily 

In Maitfl X) 

Maxo X 

2 

2Q 

J-day m9%ii preliminAry 
(S rain. In etilrsut^c and 
/ood bdjc of M4ice X) 

Maao X 

3 

21 

Z-wteki* Idborafdry ad- 
julfmenii 3-da/ maze 
preliminary 

Maae X 

4 

19 

3-wee1cB' laboratory ad- 
lusimenii 9-dB/ 
prelijnlniirf 

Maae X 

5 

20 

2 iveeh of dally handlinijs; 
3^day moste preliminary 

Maao X 

6 

Id 

2-woe1(A' dally conitaci 
with tna^e seciioru; 3-day 
maac prelliniitary 

hU%e X 

7 

19 

2 >veek» on prabletn bo%^ 
a trio) doily: 3-d ay 
matsc preliminary 

X 

8 

18 

"•Comact wUh 3 appara- 
lURfis during 2 vveeVft: 

S-day mate prollrainary 

Mflste X 

9 

. ' n ^ 

2-wooka' laboroiory ad- 
JuBtmonij 3-day mate pre* 
Hminoryi loomed Matm Y 

Ma«e X 


•First four dayi, riiroliring wheal; next live deys, Jcnklni- Warden routl- 
vBtIdn .Apparst|;i)i l«at live daye, simple problem box. 


progressive series according to the total amount of 
activity in the transfer situation. 

Group 1 was put directly into the maze after being 
kept in the laboratory three days. Group 2, In addition 
to this, was given the usual three days of preliminary 
activity previous to the maze test. This preliminary 
ebiisisted of a five-minute feeding period in the cn- 
itrance compartment and another in the food-fox com- 
partment daily, During this procedure the animal did 
not have access' to the pathway of the maze, but was 
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allowed to explore over the top of ihe apparatus near 
the compartment in which it was feeding. The trans- 
fer situation for CJroup 3 consisted of a two-weeks' 
general laboratory adjustment followed by a three-day 
maze preliminary. They were fed regularly, just as 
in the case of animals being used on maze tests, but 
were handled as little as possible. Group 4 was given 
a similar two-weeks’ laboratory adjustment, but the 
maze preliminary was increased to nine days instead 
of three. The transfer situation for Group S involved 
two weeks of handling which corresponded to the 
period of laboratory adjustment of the two previous 
groups, this being followed by the regular ihree-day 
maze preliminary. The handling in this case was de- 
signed to be approximately equivalent to the amount 
involved when an animal was being used in an appa- 
ratus. The animals were carried from their living 
cages and put in other living cages wliich were located 
at approximately the same distance away as the maze. 



ArRANOEMHNT ok MaZB SECnONS U-SEO WITH tiROUl’ 0 
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After five mmutea’ exploration in one ca^^c each animal 
was placed in another cage for a similar period, (hen 
put back in the feeding cage and fed, and finally back 
into the living cage again. ’This procedure enabled 
the experimenter to keep constant the amount of 
handling from day to day and from rat to rat. Omup 
6 was put in a transfer situation which allowed it to 
make contact with maze materials as shown in Figure 
3. Two weeks of daily runs were given through the 
three pathway sections of the maze. This activity 
constituted a kind of training, since each animal was 
rewarded with a sample of the regular diet in the food 
compartment box at the end of the run. The transfer 
situation for Group 7 consisted in two weeks of daily 
trials in a problem box, and maze preliminary. Group 

8 was given training on three kinds of apparatus as 
follows: (a) four days in a revolving-wheel activity 
cage, (fc) five days in the Jenkins-Warden moiivalion 
apparatus, and (c) five days in the problem box. The 
animal was put in the revolving wheel for four minutes 
each day; in the Jenkins-Warden motivation apparatus 
each animal was allowed to run across the grid, with- 
out shock, to food, being allowed only a nibble, and 
was then put back in the entrance compartment for 
another run. This procedure was kept up for three 
minutes, after which one minute of uninterrupted feed- 
ing was allowed in. the food compartment. 'I'hc prac- 
tice on the problem box was given just as in the case rjf 
Group 7, with the usual thrcc-day preliminary. CJroup 

9 had considerably more training tliari the nihcr 
groups. It was given the two-wceks’ laboratory :ul- 
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justment, followed by a maze preliminary on Maze Y, 
then, after learning this maze, was tested on Maze X, 
An average of 2,1 trials was required to learn Maze Y. 
In comparing this group with any others, it must be 
kept in mind that the transfer situation in this ease was 
considerably longer than the two-weeks’ period in- 
volved in the ease of t/ic other groups. 
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The results of this experiment show definitely that 
there is positive transfer of training from the various 
types of transfer situations used to maze training. 
Table 4 shows the distribution of scores, in trials to 
learn, for tlie various groups. It is evident from in- 
spection that the range of scores is smaller for those 
groups which had the greater amount of pre-maze 
activity, The arithmetic means and standard devi- 
ations for the groups are shown in Table S and Figure 
4. The smaller average number of trials to learn in 
the case of the groups with greater previous experience 
clearly indicates the transfer effect. The correlation 
coefficient of — .88 between the amount of pre-maze 
activity and the number of trials required to learn in- 
dicates the same trend. 

The reliability of the differences between group 
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TABLE 6 
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averages is shown in Table 6. 1\ may be nmetl ihai 
there is a reliable difference between Gmups I and S- 
This means that the group which had been merely 
handled made a reliably lower average score when 
tested on the maze than the uniraincd group, AH 
other groups which had had more experience than the 
amount involved in handling also showed reliable 
differences when compared with an untrained grtiup. 
The differences between Groups 1 and 3, and 1 and 
were also almost reliable, the D/SBD being 2.Slil ami 
2.25, respectively. There was no greater transier for 
Group 4 than for Group 3, in spite of the fact that tlic 
former had six extra days of preliminary activity. It 
is evident, therefore, that the latter is incfTcctive in 10 “ 
duclng greater transfer effects. This seemed rather 
strange since the three.day preliminary itself exerted 
considerable influence. Another unexpected result 
that between Groups S and 6. Both groups had re 
ceived the same amount of handling, but CWnup h had 
also made contact with maze materials during its pre- 
maze activity. Still, the greater amount of irainfer 
occurred in the case of Group 5. Kvidemlv mere 
running through maze sections without blind alleys 
has little or no transfer significance. In the cafvc of 
Groups 7 and 8, the latter had had mtirc experiemr. 
since it had been trained on two more kinds (»f appa- 
ratuses. But the former showed the greater tiaiwfer 
effect. This may seem a bit surprising, Vwt, Mnce the 
difference is not reliable, it may very well he due t-* 
chance, At least greater transfer to the maii'.c was imi 
effected by training on three apparatuses rather than 
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cmc, if llic toml amount of practice is constant. The 
most unexpected result appears in the comparison of 
the averflMC score of Group 9 with other averages. A 
somewhat less amount of transfer ticcurrcd from maze 
to maze than took place from problem box to maze. 

As will be seen by referring to Tables i anti S, the 
transfer situations used may be tbriwn into four more 
general groups on the basis of similarity. Combi- 
nations were matic as follows: Groups I and 2, Uu- 
iratnej; Groups and 4, /Idjvsled,' Groups 5 and 6, 
Handled', Groups 7 and 8, Trained; and, of course. 
Group 9 which was Maze^irained stands alone. The 
formation of these combinations is not only justified on 
the basis of similarity of transfer situations but also be- 
cause the reliability of the tlilTercncc between any two 
groups going into the same combination is low, being 
less than one in all cases. In the compuiatitin of aver- 
ages of the inmbincd groups, weight was given to the 
single group averages acoirding to their respective 
populations. And the ciimpuiaium of standard devi- 
ations was hy a (oriuula found in Yule (.1^. p. 142). 

I’hc arithmetic means arni the staiulard deviations for 
the various combinations are .shown in 'lahle 7. Here, 
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table 8 

Rbliabiuty of the Difpdrbmcb dbtwbhk Avbracbs of Trials 
TO Lbarh for tub Combined Groups 
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2.00 
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1.19 
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.56 

.49 


- .43 

66.64 


as before, the combination with the more previous ex- 
perience makes lower average scores on the maze test. 
The reliability of the differences between different com- 
binations is shown in Table 8. It may be seen that 
there is a reliable difference between the average score 
made by the untrained group and any other group, al- 
though the reliability indices are higher for the groups 
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with the greater amount of experience. The fact of a 
reliable difference between the untrained and the ad- 
justed may be interpreted to mean that those animals 
which had had a two-weeks' laboratory adjustment and 
a three-day maze preliminary make reliably lower 
scores when tested later on the maze than those which 
had not had such experience. The relative importance 
of the remaining transfer situations may be seen by 
cx'amining the reliability of the difference between the 
corresponding group averages. Since the bandied 
group had had all the experience of the adjusted group, 
in addition to being handled, a rough indication of the 
significance of handling ptr se may be had by com- 
paring the adjusted and the handled groups (see Table 
8). Similarly, an indication of the importance of 
general training /e is obtained by comparing the 
handled with the group which had general training. 
All the factors may thus be measured by the usual 
method employed in computing residual transfer 
effects. When this has been done, the relative im- 
portance of the several factors is as follows : adjustment, 
2.89; handling, 1.81; general training, 2.00; maze 
training, 1,19. 


TABLE 9 

Showing Amount and Pbrcbntaob of Savings in Trials to 
Learn tub Maze and in Avbragb Total Errors for the 
Various Comuined Groups 


Camlilricd Snvingii in (rinlH to learn Savings in total errara 


group 

Amount 

Per cr nidge 

Amount 

Percentage 

Acljutiiecl 

US 

23.9 

4.8 

33,1 

Handled 
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35.8 
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TKc sAvltvgs ia trials to learn and in total "A" crrora, 
appear in Table 9, and show the same general trend 
concerning transfer effects. The greater the previous 
experienccj the lower the score when tested in the maze. 
It may be noted that the two sets of scores are very 
closely related, the adjusted group having the smallest 
amount of transfer in' both cases, and the group with 
general training evidencing the greatest amount of 
transfer effect. The saving in the case of the handled 
group is almost the same in both scores, and ranks in 
either case between the adjusted and the maze-trained 
groups. Although the group with general training 
shows more saving than the maze-trained group when 
measured by the number of trials to learn the maze, the 
two groups show equal saving in total error scores. 
This difference is made clear when the error-learning 
curve of the two groups is examined (sec Figure 5). 

It may, be noted that the maze-trained group effected 
a greater saving of errors on the first two or three trials, 
and this resulted in a relatively lower total error score. 

In general, adjustment to laboratory conditions, etc., 
was markedly effective in producing transfer to maze 
learning, and handling was somewhat more effective. 
General training in other than maze apparatus is still 
more effective in bringing about transfer to the maze, 
and such transfer effect is apparently as great as that be- 
tween two mazes, even when the maze-training period 
was longer. 

It is obvious that there can be no transfer of specific 
knowledge elements in the case of the adjusted and the 
handled groups where no apparatus was used in the 
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training skuatinnsi. Ti seems clear, also, that the same 
can be said of the groups given general training, since 
the tjipes of apparatus used were in no way similar to 
the lest apparatus. In fact, only in the case of Group 
9, which was transferred from ma»c to maze, docs there 
appear to be any possibility of specific identical cle- 
mcnia. Here the transfer was less than from an en- 
tirely diverse apparatus, which suggests that specific 
knowledge elements may not have been of much im- 
portance even in this ease. It thus appears that general 
factors of some sort must play a major role in the usual 
transfer experiment on the maze. While the nature 
of these general factors cannot be precisely determined 
from the data at hand, some suggestions may be in order 
regarding the factors that might have been operative 
under the several conditton9~"~adju5imcnt, handling, 
and general training. 

Adjustment, which consisted mainly of two weeks 
spent in the laboratory, allowed the animals to recover 
from any ill effects of shipping, such as starvation, un- 
favorable temperature conditions, etc. During this 
time the fact that they became accustomed to living in 
metal cages may have helped them to adjust later to 
the maze which was also made of metal. Getting used 
to the new diet may also have been a factor, and the 
adjustment to the rhythm of feeding may have been of 
equal importance. The three-day preliminary on the 
test maze gave the animals opportunity to explore the 
external features of the apparatus and become adapted 
to being placed in a strange situation alone. Feeding 
in the entrance and food box doubtless caused an asso- 
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ciation to be formed between the maze and food, and 
the response of feeding in the maze may have elimi- 
nated fear to some extent. 

In handling, the animals naturally got used to being 
picked up by the experimenter and removed to another 
place, The amount of squirming and wriggling was 
noticeably reduced in the two-weeks’ period. In the 
transfer situation, handling involved putting each ani- 
mal singly into two different cages for five minutes 
each. The factor of exploring these new situations 
may aiso have had some transfer value. 

General training in an apparatus involved all the 
experience included in adjustment and in handling, 
and, in addition, gave opportunity to explore a strange 
situation which was very unlike cither the living cage 
OF the maze, During this training the animals seemed 
to develop a tendency to react to the food compartment, 
and,, when the animal was transferred to the maze, 
vigorous exploratory movements began almost imme- 
diately, apparently, an association had been formed 
between apparatus and securing food, and the con- 
nection was sufficiently genera] to carry over from one 
apparatus to another. 

In maze training, all the above-mentioned factors 
were involved, and, in addition, certain specific knowl- 
edge elements. By knowledge elements in the maze 
is 'meant specific turns, “blind-alleyness,” as contrasted 
with true pathway conditions, and the like. But 
specific knowledge may exert either positive or nega- 
tive transfer effect, as has been shown by Webb ( 26 ). 
Apparently, the net transfer effect from maze to maze 
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was somewhat less, because, although all the general 
factors which produce positive transfer were operative, 
there were, in addition, specific knowledge elements 
which produced negative transfer. 

The question as to the locus of transfer, or the stage 
in the learning process at which transfer is most effec- 
tive, has offered an interesting problem to all students 
of this general topic. The locus of transfer is usually 
found by comparing the learning curve of a group 
which has had some sort of previous training with the 
curve of another group, used on the same problem, 
which has not had previous training. 

As is well recognized, ordinary learning curves show 
distortion due to the dropping out of the animals as 
they master the problem. This involves a gradual de- 
crease ip population and results in an undue weighting 
of die scores of the poorer rats. In order to avoid this, 
the following method was used in plotting the curves 
of Figures S and 6. The score of each animal was 
extended to the sixteenth trial even though they had 
completed the learning earlier, because at that point 
the several curves were approximately flat. Time 
values in such eases were computed on the basis of four 
seconds per trial which was the modal score in the 
trials comprising the norm. These trials in this ex- 
tension were considered errorless, since the animals had 
actually mastered the problem. The time curves 
shown in Figure 6 were also smoothed by computing 
the average in each case on the middle 80% of the 
scores. Since the distributions involved are approxi- 
mately normal, the average is not disturbed by this pro- 
cedure, and more representative curves are obtained. 
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In Figure 5, based on error scores taken from Table 
10, the learning curves lor the various groups may be 
examined. The curve for the untrained group is the 
heavier line and is uppermost in all but the first trial. 
The locus of transfer for any other group will be de- 
termined by comparing the curve of that group with 
the curve for the untrained. The curves for the ad- 
justed and the handled appear to take about the same 
course, there being considerable criss-crossing in (he 
middle portion of the curve. It may be noted that 
these curves start at about the same point as the curve 
for the untrained group, thus, for these groups, transfer 
was not in evidence on the first trial. However, con- 
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2. The following general factors appear to be im- 
portant in inducing transfer to the maze: («) adjust- 
ment, i.e,, getting over any ill effects due to shipment, 
becoming accustomed to the physical environment of 
the laboratory, adaptation to ihc new diet and tite 
rhythm of feeding; (i) handling, i.e., getting used to 
being picked up and carried about by the experimenter ; 
and (e) general training, i.e., the development of a 
general tendency to explore an apparatus vigorously, a 
tendency which is doubtless due to the association of 
food with the apparatus. 

3. The locus of transfer induced by general factors 
is largely limited to the first five or six trials of the 
second problem, although much less transfer was ap- 
parent on the first trial than occurs in maze-to-maze 
transfer. 
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KXPKRIMKNT H 

The purpnjc of this cxpcrimcni was lo determine the 
transfer efTect of difTcrent degrees of overlearning of 
sne maze on the learning of another maze. It has al- 
ready been shown by Wiltbank (28) anti others that 
transfer effects vary somewhat with the degree of learn- 
ing of the first maze, but no tme has yet attempted to 
find the effects of different degrees of overlearning on 
transfer* 

The two mazes used in this cxpcrimcni were the 
same as those described in Experiment I, Maze X be- 
ing used i/I the training, and Maze V later for testing. 
The animals employed were also the same as in Experi- 
ment I, each group in this c.xpcrimcnt having been 
selected at random from Groups I to 8 of Experiment 
1. Table 1 1 shows the degree of overlearning for each 
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TABLE 14 

RBLIABIUTY OPTHB DiPPBRDMCB BETWBBN tub AvBRAOES or THB 
Various Groups in Lbarnino Ma/^e Y 
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9.93 

3.47 

3.18 

2.94 

100.00 

E Ac F 

.57 

3.85 

3.75 

.15 

55.96 


TABLE 15 

Showing Avbragb Trials to Lbarn and Standard Dilations 
FOR THE COMBINBD GrOUPS JN LbaRNINO MAZB Y 


Group 

Utier 

Number of 
animals 

Average 

Standaiil 

devlAlion 

S,D. ol 
the avetPRe 

B 

42 

20.43 

10.46 

1,61 

C, D 

24 

2B.7I 

12.22 

2.49 

E, F 

52 

IS.H 

13.37 

1,85 


Ing of the first problem. Transfer was small and 
positive in the case where the shift was made to the 
second problem at zero overlearning, i.c., just as soon 
as the norm was reached on the first. For 10 to 20 
trials of overlearning the transfer effect was negative, 
but for 30 to 40 trials the effect was positive again. 
The reliability indices for the differences between these 
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groups, as shown in Table 14, arc not significani:, hence 
it was decided to combine certain groups. Groups C 
and D were combined because they were instances of 
negative transfer and their averages were nearly the 
same; Groups E and F, having about the same average, 
although showing positive transfer, were also com- 
bined. The average and standard deviation of the 
combined groups are shown in Table IS, and the relia- 
bility of the difTcrence between the averages in Table 
16. From the latter table it may be seen that the 
difference behveen Groups B and C-D, or between 
zero overlearning and 10 to 20 trials overlearning, was 
significant. And the difference between the latter 
group and Group Pv-F, which had 30 to 40 trials over- 
learning, was also significant. 

Something of the nature of the factors involved may 
be shown by an analysis of the pcrsistance of the be- 
havior pattern from the first to the second maze. 
Since the first maze or transfer situation was identical 
for the different groups, it might be supposed that the 

TAHLE 16 

Rbliahilitv of tub Diffbrbncb retwebn tub Avuragbs of tub 
Combined Groufs 


Groups 

compared 

Difference 

41 

■SS'g 

® t 2 

tq TJ W 

■Sg s 

Q.£g .'5 

u ^ 

5 gQ 

R!2q 

•S'® 5 

^ U S 

as 

So 8*5 

11 & C, 1) 

8.28 

2.96 

2.89 

2.H7 

L 00.00 

n Sc K, F 

1,99 

2.-IS 

2.39 

.83 

79.67 

C, D, k n, F 

10.27 

3.10 

3.02 

3.40 

100.00 
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TABLE 17 

SnomNO Frbqubncy and Phrcbntacb op "A" Ebroms in th» 
DippBRBNf Culs-db-Sac por tub Combined Groups 

Gn&ui^ 

CMl'dfi'nic B C“D K’F 


nuJtibet 

Frcq. 

9& 

Preq. 

% 

Preq, 

% 

t 

203 

12.7 

120 

9,4 

17& 

9,9 

Z 

451 


405 

31.6 

m 

29.1 

3 

S2 

5.1 

115 

9.0 

154 

7.6 

4 

557 

34.8 

515 

40J 

694 

39.2 

5 

55 

3,4 

27 

2.1 

4& 


6 

252 

15.7 

100 

7.2 

m 

llA 

Total 

1602 


m2 





NOTE: The probability that tb« mo <lliiHbi4ilonj ar« due to the aame 
factors Ji m follow^: 

Groups B and C-D .00001$ 

Groups find E-E .OlSfi 

dUtribution of errors in the second ma^c would be the 
same regardless of the amount of overlearning in the 
first maize. However, this was not the case as may be 
seen by Jreference to Table 17. In fact, a considerable 
variation in the percentage of errors made in the differ- 
ent alleys occurred under the various degrees of mas- 
tery. The dilTerence In type of response made from 
the several degrees of overlearning can best be shown 
by applying the Pearson Chi-Square test to the error 
distributions of Groups B, C-D, and E-F. 

The Chi-Square test is a method for determining 
given curves are significantly 
difFerept. Although, in most applications of the test, 
one of the two curves has been a theoretical one — such, 
for example, as the norrtial curve — it may be used in 
comparing fivo empirical curves, as has been done by 
Wilson (27). The formula used in the latter case is 
as follows : . 
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JViV, { i 

^ J }. 

lO.IMK) I <«,—»,) > 

where and yis are percentages of entrances into the 
different blind alleys, «, and ns arc the numbers of such 
entrances, and Ns are the total number of entrances 
in all alleys, and S is a summation sign. "When the 
formula is applied the result must be converted into a 
probability value by the use of Pearson’s Tables (IS). 
The value there found indicates the probability that 
the two curves arc a result of the same factors. This 
would mean, of course, that the smaller the index the 
greater the likelihood that (he hvo curves arc produced 
by different factors. 

When application of the method was made to the 
distribution of errors in the present data, the following 
results were obtained. The probability that the under- 
lying factors in the learning of Group B were the same 
as in Group C-D was equal to .000015, The corre- 
sponding value between Groups C-D and E-F was 
,0156.* This seems to mean that the responses made In 
Maze y differ significantly between zero and slight 
overlearning of Maze X. Furthermore, when over- 
learning of the first maze was greater, there was a 
tendency for the response made on Y to be somewhat 
more like the type made after transfer at mastery, since 
the index was greater in that case. That is, there ap- 
pears to be the greatest pcrsistancc of the first maze 
habit, and thus ihe most interference effect, when trans- 
fer was made following a slight amount of overlearn- 
ing. The amount of pcrsistancc in that case was 
greater than when transfer was made at cither zero 
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overlearning or with the greatest amount of ovcrlearn'- 
Ing used, in the present investigation, A possible ex- 
planation of these facts has been suggested by Razran 
(16) , who supposes that there may be stages in com- 
plex learning corresponding to the ‘'generalization’’ 
and "differentiation” phases of specific conditioning. 
The “generalization” phase of maze learning would 
presumably be represented by Group C-D which was 
transferred somewhat beyond the usual norm of mas- 
tery. The "differentiation” phase would then be 
represented by Group E-F in which the fixation pro- 
cess had been carried much further. Accordingly, if 
transfer is made at the period of "generalization,” 
there would be the greatest amount of habit inter- 
ference, or persistence, of the first habit. If transfer 
follows a greater amount of practice, that is, after the 
first habit has become "differentiated,” there would be 
less persLstance of the original habit in the second 
maze. The facts at hand seem to correspond approxi- 
mately to the expectations from this suggestion. 

The question now arises concerning the relation be- 
tween partial learning and overlearning as regards 
transfer effects. A preliminary study of partial learn- 
ing was made by Wiltbank and was discussed in the 
introduction. Although his general results are not 
strictly comparable to my own, it may be pointed out 
that the mazes he used were only slightly more difficult 
than those used in the present experiment. The curves 
based on the two sets of data have been placed in juxta- 
position in Figure 8 so as to bring out the relation be- 
tween partial and overlearning. When both curves arc 
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considered, the fnllowing rcbii<in<ihipis between degree 
of masicr)' and transfer effect seem to Iiold ; 


1. Siiftbt ptariid Inming 

(2'S iri^M 

2 . Miidium [taiiial Icammi; 

(l6-2n imH) 

3 . First ma?^ timtffwl 

(nortn + o/ 5 ) 

4 . Slight Dverlraming 

(10-20 trials) 

5 . Large overlmning; 

( 30-10 trials) 


Small negative irmHicr 
Large pmilive trajislicr 
Small pmilivc transfer 
l^rge negative iranalcr 
Xfeilhim poriiivc triuufcr 


It may be in order at this point to suggest the proba* 
bfc factors of transfer involved al the various stages of 
mastery. When transfer was made from different 
stages of partial learning, it appears that general factors 
may have been largely responsible for the transfer 
effect, for, after an initial drop, the greater the amount 
of partial learning the greater the transfer. This trend 
continued only to a certain point, where apparently 
specific knowledge clemcnifi began to produce inter- 
fering effects, and, as a result, the net transfer was re- 
duced. This contbinaiitm of factors seems to start 



KKiURK K 

SiiowiNCi PiiHciiKi'Aiii-; ni‘ .S.wiMi IS Thansitk riuiM Dii i'ihcnt 
D nciRBKS OF I’AfOMI. I.MKXIN*.' AN'O UF OvjHI.IiAKMNtl 
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during the later stage of partial learning and to con- 
tinue into overlearning. However, specific knowl- 
edge elements probably produce greater interfering 
effects during the early stages of overlearning than at 
later stages, which would explain the change from 
negative to positive transfer in the last two stages of 
overlearning. 

Conclusions 

1. The transfer effect was not found to increase con- 
sistently with greater amounts of overlearning of the 
maze when transfer was made from one maze to an- 
other. The transfer effect was small and positive when 
the shift to the second maze was made at mastery. The 
effect was large and negative when there had been 
slight overlearning of the first maze, but became posi- 
tive, although small, when overlearning had been 
carried further. 

2. The specific transfer value from one maze to 
another was found to differ according to the degree of 
overlearning of the first maze, as shown by a difference 
in the distribution of errors. The first habit tended to 
persist most when overlearning was slight (10-20 
trials), it being greater at this stage of mastery than 
when the shift was made at either mastery or with 
greater overlearning. 



IV 


DISCUSSION AND SUMMARY 

One point of general interest that might be men« 
tioned is the reliability of the maze as a measure of 
group differences. Hunter (10) has criticized the 
maze because he found it to be very unreliable. He 
concluded that even group differences were not valid. 
Carr (2), on the other hand, has defended the use of 
the maze on the ground that inaccuracies of measure- 
ment would be positive and negative, and that, in a 
group, they would tend to cancel one another, leaving 
the average more or less undisturbed. Recently, 
Tryon (23) has demonstrated statistically that the 
maze is not only reliable for group differences but also 
probably fairly accurate for individual differences. In 
this connection he developed a more accurate method 
of comparing group averages. It is well recognized 
that errors of measurement increase the variability of 
a set of scores, but errors of measurement occur only 
when the reliability of the measuring device is not 
unity. It has been shown by Yule (33), Kelley (12), 
and others, that the true variability may be found by 
multiplying the observed variability by the square root 
of the reliability coelficient (iS’.D.,— *.S’.D.„,,.Vr) . If 
the difference between averages arc to be interpreted 
accurately, the true variability of tlic various groups 
should be used. Therefore, true standard deviations 
should be substituted for observed standard deviations 
in the reliability of the difference formula, as lias been 

m 
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shown by Tryon (20, 21). When these substitutions 
are made, the formula is as follows; 

Difference 

Critical ratio = 

The use of the corrected formula operates to increase 
reliability of the difference. This is as it should be 
since in the uncorrected formula the measures of varia- 
bility are too large due to errors of measurement 
derived from an unreliable test, Tryon (21) has defi- 
nitely shown that this does not involve an over- 
correction, but in populations of the size here involved 
it is, on the average, slightly too small. 

The various methods of computing reliabilities of 
maze scores have been fully discussed by Tolman and 
Nyswander (19) . The prevalent method, and the one 
used in the present study, consists in correlating total 
error scores on odd trials with total error scores on 
even trials. The correlation coefficient found by this 
method is the reliability for half the test, since only one- 
half of the scores were correlated with the other half. 
But the reliability of the whole maze test is wanted, 
therefore the Brown-Spearman formula must be ap- 
plied, These various coefficients, one for each group 
in each experiment, are shown in Tables 18 and 19. 
It may be noted that the probable errors of these r’s are 
very large. It was decided, therefore, to use a general 
reliability coefficient for each of the two mazes. In 
order to get a representative coefficient for the groups 
on one maze, an average of the r’s for the various 
groups was computed. But, since a correlation coeffi- 
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TABLE 18 

COBFFICIBNTB OF RbUABILITY OF TIIH Ma/B FOR TIlH VaMOUS 

Groups in Expfriwb.vt I (Maf.b X) 


Group 

Mufnbur of 
AmmaU 

Rnw r 

Corrected r 
Brovrn prophecy 
fnrinviU 

Probable 
error 
of r 

j£‘Vai\ic 
for r 

1 

20 

M 

.n 

.06 

I.Oli 

2 

20 


.SI 

.U 

.56 

3 

21 

M 

.70 

.0? 

.87 

+ 

19 


.7J 

.08 

,89 

5 

20 

.77 

M 

.CM 

\.n 

6 

16 

.70 


.05 

MS 

7 

19 

.40 

.57 

.10 

.65 

« 

It 

M 

.94 

.az 

1.70 

9 

IS 

.49 

.66 

.09 

,79 


NOTE: AvvtiRt of ibe nine it-valdeii h 1.002, which coirnpondg to an 
r of .7«2±.02. 


TABLE 19 

COBPPICIBNTS OF RBMADILITY OF TUB Ma/.H FOR TUB VARIOUS 

Groups in Experiment H (Mark Y) 


Group 

Number of 
animnU 

Row r 

Corrccied r 
Brown prophecy 
formula 

P rob Able 
error 
of r 

ir-vnlue 
lor r 

A 

20 

-89 

.94 

.02 

1.76 

h 

42 

.92 

.96 

.QJ 

1.91 

C Ss D 

2+ 

.83 

.91 

.04 

1.50 

u 

24 

.91 

.95 

.01 

J.84 

p 

23 

.95 

.98 

.01 

2.18 


NOTE; AveroBC of the Rvc s-valuei la 1.84-, which eorreapontlii to an 
r of .»Sl±.00fi. 


cient is not a linear function and is not distributed nor- 
mally, an ordinary average is not accurate, except when 
the coefficients are very low. This error may be 
avoided by use of a method discussed in Fisher (6) . 
This involves the conversion of each r-valuc into a z- 
valuc. The ordinary average of the a’s may then be 
computed and then changed back into an r~valuc, The 
average correlation coefficient found in this indirect 
way is more representative of me group of r’s than an 
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ordinary average would be. The x-values for the 
various r's ate also shown in Tables 18 anti 19. The 
formula for 2 is as follows; 

log (1+r)— log(l— r). 

When this method is applied in the present study, 
the resulting reliability coelBcients are 076 and 0.9S 
for Mazes X and Y, respectively, The fact that the 
reliability is higher for the more difficult maze is in 
agreementwith die findings of Tolman and Nyswander 
and Tryon. That Maze Y is more difficult than 
Maze X is indicated by a D/SDD of 5.62 between two 
Groups, 3 and A, which learned the two mazes under 
the same conditions. Evidently the mazes used in the 
present experiment are quite tellable, certainly suffi- 
ciently reliable to measure group differences. In 
general, it appears from the size of the reliability co- 
efficients that the maze compares favorably with many 
other types of tests as regards accuracy of measure- 
ment. 

The identical-elements theory of transfer seems to be 
the prevailing one in recent psychology, However, it 
should be clear that the results of the present investi- 
gation are against this theory as it is generally under- 
stood, The usual view seems to be that all transfer 
effects are dependent upon the presence of identical 
elements in the two situations. The present study shows 
that thisds not the case, at least in the animal field. In 
fact, it is evident that such general factors as are in- 
volved in the mere handling of animals in which no 
apparatus is involved are effective in producing trans- 
fer in a later situation. The same is true of laboratory 



TRANSFUR OP TR/VINING IN THE WHITB RAT 


53 


adjustment, and of adjustment to an apparatus which is 
in no way similar to the test situation. The question 
may then arise as to whether handling and other such 
factors are not identical elements, since they occur in 
both the training and the test situations, and in both 
cases these factors remain approximately the same. It 
might be argued that these general conditions are them- 
selves elements. This classification would involve two 
characteristics — identity and elementary status — ^which 
may be regarded as independent of each other. We 
may admit, without argument, the fact of identity or a 
high degree of similarity, in most cases at least, between 
factors in the test and transfer situations. However, 
we feel that it is important to insist that these factors, 
in many cases at least, are general, as t])at term is 
usually used in psychology. The theory of identical 
elements seems to consider a response an element no 
matter what its degree of generality may be. 

The responses which arc made to handling and the 
like are more in the nature of an attitude or a general 
motor set than a specific overt patterned reaction, 
These responses, obviously, are quite general and con* 
stitute what might be considered the background of 
more specific overt responses. However, the problem 
of distinguishing this aspect of a response from the 
specific patterned response is not always an easy one. 
As a matter of fact, it appears that there are all de- 
grees of responses, since they might be arranged in a 
continuum from the most general attitude to the most 
specific overt response. If attitudes of this sort are to 
be classed as elements, then the identical-elements 
theory loses its distinctiveness entirely. 
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Summary 

1. A considerable amount of positive transfer was 
found to occur from situations involving acljustment, 
handling, and general training to the maze. 

2. The amount of transfer was least in the case of 
general adjustment, somewhat greater under the com 
ditions of handling, and as great from general training 
as from previous maze training. 

3. The locus of transfer induced by general factors 
is largely limited to the first five or six trials of the 
second problem, although much less transfer was ap- 
parent on the first trial of the test than occurs in maze- 
to-maze transfer. 

4. The transfer effect from maze to maze was some- 
times positive and sometimes negative from the differ- 
ent degrees of overlearning, and showed no consistent 
trend. When transfer was made at mastery, the effect 
was positive; with transfer at slight overlearning the 
effect was negative; but with considerable overlearning 
the transfer effect was positive again. 

5. The amount of transfer from maze to maze with- 
out regard to direction did not consistently increase 
with greater amounts of overlearning. The effect was 
small and positive when transfer was made at mastery 
of the first maze (4 out of 5 perfect trials), but large 
and negative with slight overlearning (10-20 trials), 
and small and positive again with considerable over- 
learning (30-40 trials), 

6' The type of response involved in the transfer 
effects varied with different degrees of overlearning, as 
shown by a marked difference in the distribution of 
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errors in the second maze. The pattern of response 
established in the first maze tended to persist more 
when overlearning was slight (10-2() trials) than when 
it was either greater (30-40 trials) or less (simple 
mastery, 4 out of S perfect trials). 
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DBS FACTEURS GfiNfiRAUX DANS LE TRANSFERT DE L'EN- 
TRAINEMENT CHEZ LE RAT BLANC 
(Rtaiimf) 

Ccu< enqu^te n comprla deux cxpfriencu, dans la picmUre deM]U(illc» on 
n eaaayi de d^tcriTiiticr la valetir du iransferl de renlrainomeni Rjnjml A 
I'ppprentltsage du lalsyrimhe. On a employA dea raU blancs niAlea comme 
sujcts dans coue exp6rlenee, et le loltyrliUhe a iiA I'appartll de leit employd 
toujours, On a donoi diftirtnies aonei de Veniralnetnent gAntml lana 
employer le labyrJnthc A neuf gfoupes de raia, chanin composi d'environ 
vingt rate, et ensulte on les n enlrolnii dans un labyrlnthc ideniiqtic, Lea 
altuntloAa de transfcrt pour lei dIffArenia groupes ont compria radjuaitment 
g£n£ral au laborntolrc, In manipulation, I'emrainemcnt dans un apparel! 
autre qu’un labyrinthe, et dans un labyrinthe d'une forme dilTdrenle de cellc 
du labyrinthe du test. Un groupe de coiUrfile n'avnit pus cu d'cxpjtience 
antirieure, gitiirole ou spAcinque, avant ce test du labyrinthe. Lci rfsulian 
moDtront un tranafcrt posllU considArable de toutea les divorscs liluoilons. 
L'effet du transfert a it^ plus grand avec une plus grande quantity de 
I'entrolnement ginAral, I’effet itant oussi grand de I'cntralnement dona uno 
bolte A problAmes quc de I'entrafneinent anbArleur dana le labyrinthe. On a 
Interpret^ ces rAsultats comme de I'^vldence que Ic transfert de I'entrafne- 
mont peut avoir lieu par dea fldmenu gfndraux dc no»>connal»aance aiisai 
bien quc par dqa 4l6rnenta spfdliquca de connaiisnnce, el eu inAme degrA 
Lea bans eSeta de I'entrolnement gdnfral ontAHcur ont acmbli en grande 
partie IlmUia aux: cinq ou sit pteimAreg Aptcuvei dons I'apprenlltiaga du 
labyrinthe. Dans la acconde eipfricnce, on a donnt A cinq groupes de rati 
blancs dc dlfldrentes quantlljs de surapprentlssagc dans Ic rnbyrinlhe X et 
enauite on les o tcatfa dons le labyrinthe Y, On n tranafdrd un groupc an 
labyrinthe Y aprbi I'nrrlvde A une simple iiofrno (quntre parcours parfnili 
sur cinq) dans le labyrinthe X. Lea qiiatre autrea groupei om Ati rrans" 
ffirda Bu labyrinthe Y nprJs avoir eu dix, vingt, trenle, et quaroiue fprcuvea 
de surapprentlssoge respectlvemcnt dans le labyrfnihe X. Le groupe de 
contrSle, blcn entcndVi a apprls le labyrinthe Y snns avoir cu oucune Aorte 
de I’entrifnement antArieur dans les exptArienecs. On o irouvA que les 
effets de tronsfert del qunntilAs aiigmentanles de lurapprcniiasnge nc montre 
flucune tendance constanto. L’eAfct net de transfert i AtA petit cl posiiif 
pour nul Bui'apprentjBsagc, un pen plus grand et nAptif pour une petite 
quontitA (10 it 20 Apieiivei) de suropprcntlssage, et moycn et posiiif pour 
une plus grande quantitA (30 A 40 Apreuvea) de aurapprentisaage. On offte 
dea suggestions tentativea pour cxpllquer cci rAsultati, 


Jackson 
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ALLGEMf-lNE EINROSSE BF.I DER UnERTRAGUNO HER 
EINOBUNG BEl WEISSEN RATTEN 
(Rcfcrat) 

l)Jc vorJpcjfTci'iJc ciilhicll zwcj KxpcrimcHlc, ))»? triple vvar 

cin Vcrsuclij bctsiimmen, UU %ii weirhem C^rajide «idi die tollf;(cmcinc 
EinabutiR (general iraimnR) auf tlnn Krlcrncn eineq Lnhyrinibc? (ma^c) 
iibettrngett iWsppb Man gchrauchle nK VcNuchslicrti hei diepiciii Vcraurhc 
mfinnBchc vrcliac Haiirrt «nd si? Virmifh?sppsra! ilicnic durtbivirg dair 
Labyrlnili. Ivp^ vriirden ncun Gnippcii Vfin Rancii, jede Gruppe nu« 20 
Raden btKichcndi auf vcrnrhled^ue allH^mdhc Wci?cii ahne Lahyriruh 
dreaaicH, und dann an cin«m immer gleicli blcibcnden Lobyrimh cInRcubt, 
Die zu uberlragonden Siiuaiionen bei dietten verAchicdcncn Gruppen iim- 
(niisLcn ollgcmcinc Anpatiiiing an daa LaWaiorhim iind an die llandhaliung 
(handling] und Einiibung an eincm Apparai daa nicht cine Labyrinth war» 
und such an dnem LabyrJnih dcf^cn GeslallnnR andera war, aid die den 
cigcnUicbeti VcrdUciisilabyrUuhe?. Ea gab aurh cine Kontrollgruppc die 
keine vorhcrgeheitde Einiihung geoonen hatte, wedcr nllgemeiner noch 
ipezifischcr An, chc ^le an dem Labyrinth geprufi worden war. Die 
Befunde crweidcn cine zicmllth iiurke poiiiftive UcbcrlraKung aun alien 
venchicdcncn Siiuaimncn, DtMo mchr altgerneinc Kinubung die TIcrc 
mllgGmachl hniien, desto mchr Ueberlrogtjng fand Bialt, >TDhci die Wirkung 
dcr Binubung an ciriem AufgabckAbtcn (problem box) ebenso Mark war 
als die eitter vorhergehenden Einiibiing an eincm Labyrinth. Der Ver- 
fasser Jeiitei diestc Befunde nU Beweis dafiir, dan cine Ueheriragung dcr 
EinObung eben so gut durch ollgcmcine, nichi kciintnUfiartigG tnan^ 
knowledge) Z^lnltimc w!c durch upc^i/iachc Kri?ninif«c (iprcBtc knowledge 
elcmcntu) vcrursnchi werclcn knnn. Die voncilhafitn Wirkungen der vop 
licrgchcrtdcn nllgemeirtcn UinUbiing ^chelrtcn ulrh gro»i?rmeil?i auf tlt^ crMcn 
font Oder aeclu Ventuehe bei drm Ericrnen de^ Labyrinrhcfi -jtw licicliritnkeri. 
In dem zwciicn Experiment IrenA man 5 (fnippen wgImct Hnitcn ilnv 
Labyrinth X mchr odcr weniger tibcrgrundlich crlcrncn (nverleani). 
Dicflc Crruppen wurden dann nn dem Labynnib V gepriift, EInc Gnippe 
wurde an dati Labyrinth Y verMcirt nadidem nio an dem Lahyryimh X cine 
einfachc Norm [dcr Feriigkcil] [4 fclilrrfreic Rcnnnngcn (fuh«) non S 
Versuchen] cr/idt hatieii. Die 4 andcren <]nii)pen worden nn d.m l.nliy- 
rinili Y veriictstr nachdem nie an dem Lnhirlitih X renprktiv' 10, 2fl, nnd 
40 Uchcr-cflcrnungnrcriuche (orcrlc^oining trial?] gemarht hjtttrn. Die 
Konirollgruppc crlcrntc nniorlUh dm Laliyiinih nlinc voriicr irgend cine 
Art I'inilbunK^crfahrmiK (previous rxpcrkmc training) gcno^Mcn z.ii halicn. 
Die Wirkungch dcr tlclierlragniig nun T^icigciideh tiraden den Uebcrcrlct" 
nenn zeigien keine bc^iandige Kiclitong (eoii'iiHiem trend). Die reine (net) 
Wirktmg der Ucherlrngung war. nhitc Ucberetlcrncii, klciri nnd puniiiv, bd 
geringem Cirndc den tJdirrcTlcrnrnn (10 bin 20 VcrMirlir) tiwm 'Starker 
tind negativ, und bei hohercin Gradr ilc? Delirreilemm^ (bt \m 'til \'cr- 
Buche) mitidmai'^ig *^Urk und pu^itiv. 1 'h wcrdcii ciiiigc vorlaulige Viw- 
uddtigc ziir erkidrung diescr nefunde dargclmieii. 


Jackson 
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I 

PROBLEM AND METHOD^ 

There has been an increasing use of color and color 
combinations in situations designed to convey messages 
by means of visual symbols, Familiar examples are 
magazine advertisements, posters, billboards, and auto- 
mobile license plates. Two important justifications for 
the use of colors in advertisements and the like are 
attention value and the pleasant feeling-tone which 
they arouse. In employing colors for these purposes 
there is danger that the words or other symbols may 
lack adequate visibility, Whenever colored letters, 
words, or other symbols arc used on either a white or 
a colored background, those combinations which, in 
addition to having attention or affective value, favor 
quick and accurate apprehension should be chosen. No 
experiment has yet shown whether attention value or 
affective value of the colors in which symbols are 
printed facilitate perception, or investigated certain 
other influences of color on the apprehension of mean- 
ingful characters. Further experimentation and 
analysis is needed to furnish a more adequate knowl- 
edge of the effect of color on visual apprehension and 
perception. 

When background is of a constant color or bright- 
ness, colored symbols may be printed in two ways: 
(1) with all symbols in the same color (homogeneous 
color series), or (Z) with variation in color from one 

^'rhe expenses nf this siiitly were met liy a rcscareli pfrant fnnn the 
Graduate ScIuk)1, University of Minnesota. 

[65] 
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symbol to another (heterogeneou? color series). The 
purposes of the present experiment are ( 1 ) to determine 
the effect of color upon visual apprehension of {a) 
homogeneous colored stimuli and (i) heterogeneous 
colored stimuli; and (2) to make a comprehensive 
analysis of the influence of color and brightness con- 
trast on visual apprcherjsion and perception in reading. 
The influence of affective value, attention value, and 
luminosity of colors on the apprehension of colored 
letters will be considered in the analysis. Other features 
of the study include determinations of (1) the effects of 
scoring methods on average spaa of visual appre- 
hension and on reliability of scores, and (2) the in- 
fluence of letter position on visual apprehension. 

Various ways of measuring span of visual appre- 
hension have been employed. In general, they may be 
classified into two techniques: (1) studies in which 
some form of tachistoscope is used to expose the stimuli 
for an interval which is shorter than the reaction-time 
of the eye,® and (2) experiments in which stimuli arc 
exposed from three to six seconds with or without the 
aid of some simple apparatus. The latter technique 
was used in this experiment 

There are two main parts to the investigation. In 
Part I the influence of homogeneous colors, and in 
Part II, the influence of heterogeneous colors on the 
apprehension of letters, was studied, With the homo- 
geneous colors 100 sophomore university students 
served as subjects. There were SO men and SO women. 

In Part I, the stimulus material consisted of series of 


®Fot methods and results sec Tinker (26, 28). 
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colored letters pasted on white cardboard, 7 by 28 
inches in size. These letters (ordinary block letters) 
were 3 inches wide and 4)4 inches high, and were cut 
from the standard Milton Bradley colored papers. 
Eight different colors were used : black,® orange, violet, 
blue, red, neutral gray, green, and yellow. Each stim- 
ulus card had eight letters, all of the same color. Only 
consonants were employed and the position of any 
letter was systematically rotated so that each appeared 
an equal number of times in all eight positions on the 
cards. In all, there were 32 stimulus cards, 4 letter 
series of each color. The colors on successive stimulus 
cards were systematically rotated throughout the scries. 

The subjects were tested in groups of about 30 each. 
The stimulus cards were exposed, one at a time, for a 
period of three seconds. Exposure was achieved by 
merely uncovering and covering the stimulus card in 
a position clearly visible from all parts of the room. 
Timing was done with a stop-clock. A short practice 
series preceded the experiment proper. 

The subjects were instructed to view carefully the 
series of letters on the card while it was exposed. As 
soon as the stimuli were covered the students were to 
write down as many of the letters as they could remem- 
ber and in the same order as on the exposure card. 
Omission of a letter was to be designated by a dash in 
the series. Motivation appeared adequate and pro- 
duced excellent cooperation. 

In Part II, with the heterogeneous colors, there were 


®For convenience, all letters arc termed colored ahliiMi(»li iw) 
achromatic stimuli (hincic and nray) are included in the scries, 
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two groups of subjects: 100 men in one and 100 women 
in the other. All were university sophomores. Scores 
for both color preferences and visual apprehension 
were obtained from these subjects. 

The same colors were used as in Part I and the stim- 
ulus series were constructed in a similar manner, but 
with the following exceptions: Each letter on any card 
had a different color, i.e., all eight colors appeared on 
each card. The colors were systematically rotated so 
that each color appeared the same number of times (3) 
in each position on the card. In all there were 24 
stimulus series. Presentation of stimuli was the same 
as in Part I. 

Stimuli for color preferences in Part II were made 
by pasting sheets of colored paper, 5 by 8 inches in 
size, on white cardboard, 814 by 1 1 inches. Each card 
carried a large identifying number below the sheet of 
colored paper, 

Color preferences were obtained by the method of 
paired comparisons. Possible space errors were con- 
trolled by systematic variation in position of stimuli 
during mcceeding comparisons. The pairs of stimuli 
were exposed side by side for a thrcc-secoiid interval, 
From the results an order of preference was obtained 
for each subject and ranks I to 8 assigned the colors. 
Rank 1 designated the most preferred color. 
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METFIODS OF SCORING, RELIABILITY, 
AND INFLUENCE OF LEITER POSITION 

Methods of Scoring 

The experimental determinations of the span of 
visual apprehension (often incorrectly called the span 
of attention] grew out of the early work of Baxt, Cat- 
tell, Wundt, Erdmann and Dodge, and others, Since 
1900 there has been an ever-increasing number of 
studies in this field. The numerous investigators of 
visual apprehension have not only used various pro- 
cedures and kinds of stimuli but have also employed a 
variety of methods in evaluating their data. A survey" 
shows that these procedures differ so greatly that a 
comparison of the results obtained by different experi- 
menters is hazardous. Three methods of scoring which 
have frequently been employed are; (I) average spans 
of visual apprehension computed from raw scores 
(average number of items correctly reproduced in the 
proper order per exposure) ; (2) spans calculated from 
weighted scores in which full credit is given for correct 
items and partial credit for partly correct items, i.c., 
correct symbol but placed in a wrong position; and 
(3) computing the statistical limen (the stimulus value 
for which there are correct judgments in 50% of the 
cases). The last method is considered most adequate 
by both Dallenbach and Fernbergcr." 

Since no experimenter has used two or more methods 


■•Si'c 'I’inkfr (26), 

‘‘See (kiilfiird sind Dallcakicli (11). 
"Sec 'J 'inker (20). 

[ 09 ] 



70 


OBNETIC PSyCHOI,OOY MONOGRAPHS 


of scoring the same data, the effect of scoring methods 
OH span of visual apprehension is unknown. Although 
few would deny that variation in method of scoring 
alters the span, an experimental determination is 
needed to yield a quantitative measure of the modifica* 
tion. 

In this section a comparison of the spans obtained by 
scoring the same data by three scoring methods will be 
made. The three procedures are: (1) average span 
computed from scores in which a credit of one is given 
for each item reproduced correctly and in the right 
place, and, in addition, a credit of one-half for each 
item reproduced correctly but out of place in the series 
(Method I) ; (2) average span calculated from scores 
in which each item correctly reproduced receives a 
credit of one irrespective of whether it is or is not in 
the right place (Method II) ; and (3) average span 
derived from scores in which a credit of one is given 
for each item correctly reproduced and in the right 
place, but no credit for items reproduced correctly but 
in wrong positions (Method III) . 

The analysis of spans derived from these tliree 
methods of scoring was made for results from three 
groups of 100 subjects each. With the homogeneous 
colored stimuli the group of 100 subjects contained 
both men and women; with the heterogeneous colored 
letters one group had 100 men, the other, 100 women. 

The average span of visual apprehension was first 
computed for each individual and then an average of 
averages for each group as a whole. Comparisons of 
the latter will be made here. 
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The ineans with the standard error of the means as 
computed by each method and for each group follow: 


Group 1 

Homtigcncous colors, 100 M+W, Method I Mean = j.73±.08 
Ilmnoiicncouw colors, 100 M-j-W, Method II Mean = 6.22±:.07 
Homogeneous colors, 100 M-|-W, Method III Mean = 5-27±.08 

Group 2 

Hcicrogcncous colors, 100 M, Method I Mean = 5.56±.09 
Hclerogcncous colors, 100 M, Mctliod II Mean = 6.08±:.08 
Heterogeneous colors, 100 M, Method III Mean = 4.97 ±.09 


Group S 

Heterogeneous colors, 100 W, Method I Mean = 5.45±.09 

Heterogeneous colors, 100 W, Method II IMcan =; 5.93±.08 

Heterogeneous colors, lOO W, Metlwd III Mean = 4,98±.i0 

The first line of the above results should be read as 

follows: Where homogeneous colored letters were 
employed as stimuli, and for a group of 100 subjects 
(men plus women), scoring by Method I (each cor- 
rect item in proper place = I, and correct but out of 
place yielded an average of S.75 letters read per 
exposure; and the standard error of the mean is 0.08. 
The rest of the results should be read in a similar 


manner. 

I'hc trend of the results are the same for all three 
groups of subjects. As one might expect, Method II 
yields a larger average span than I, and Method III 
has the smallest span of all. It is readily seen that 
there is a marked tendency to perceive and remember 
more letters than can be placed in correct serial order. 
When one-half credit is given for correctly reproduced 
items that are in wrong places, the average span is in- 
creased by approximately one-half an item (Method I 
w. Method III). Conseiiuently, when full credit is 
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given irrespective of the order of the reproduced 
letters, the span is increased by approximately one item 
on the average (Method II vs. Method III). The 
variability is relatively small in all scoring procedures, 
but the comparative variability is greatest in Method 
III for all three groups. 

The actual sizes of the differences between averages 
and the significance of obtained differences arc listed 
below for each of the three groups: 

Group 1, Homogeneous Colors (Men and Women) 

Method II minus I = 0.47, — — = S2.57 

"d 

Method I minus HI = 0.4B, = 33.95 

Method n minus III “ 0.95, — = 23.04 

Group 2. Heterogeneous Colors (Men) 

Method II minus I = 0.53, -5— = 14.72 

Method I minus III = 0.58, — = 17,58 

Method II minus III = 1.11, = 21.80 

‘'D 

Group 3, Heterogeneous Colors (Women) 

Method ir minus I = 0.53, — = 14.57 

Method I minus III = 0.47, — = 21-36 

Method 11 minus III = LOO, = 23.81 

These results are remarkably consistent from group 
to group, and the trends hold irrespective of either the 
color of the stimuli letters, or the sex of the subjects, 
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The directions of the differences are very stable as 
shown by the size of the ratio of the difference to the 
standard error of the difference. The formula for 
correlated measures was employed to compute the 
standard error of the difference. 

The intercorrelations of the different methods for 
each of the groups are shown below: 

Group I 

Homogeneous colors, I with II r = .99+ 

Homogeneous colors, I with III r =: .98+ 

Homogeneous colors, II with III r — .869 

Group 2 

Heterogeneous colors (men), I with II r = .920 

Heterogeneous colors (men), I Avitli HI r = .929 

Heterogeneous colors (men), II with III r=: .843 

Group 3 

irctcrogcncous colors (women), I with II /•=.920 

Heterogeneous colors (women), I with HI r=:,973 

Heterogeneous colors (women), II As'ith III r = .916 

The above results indicate, in general, that Method I 
correlates very highly with both II and III. In Group 
1 these correlations arc ,994 and .984, respectively; in 
Group 2, .920 and .929; and in Group 3, .920 and .973. 
Although the correlations between Methods II and III 
.are lowest in every group (.869, .843, and .916, 
respectively), they are still rather high. These coef- 
ficients of correlation indicate that the span of visual 
apprehension is being measured with approximately 
the same degree of adequacy by all three methods of 
scoring. Method I, however (in which full credit is 
given for letters reproduced correctly and in their 
proper place, and one-half credit for items correctly 
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reproduced but out of place), appears to be slightly 
more adequate than the others."' This is substantiated 
by the fact that Method I correlates very high with 
either II or III; it gives some credit for every item 
that is reproduced correctly; and it penalizes lack of 
ability to place any letter in its proper place in the 
series. One is justified, however, in employing- either 
Methods I, II, or III in order to obtain an apprehen- 
sion score which appears to be more significant for the 
problem under investigation. 

In general, therefore, when it is desirable to know 
absolute span of apprehension, scoring method is im- 
portant; but, if size of span in relation to others in the 
group is wanted, any one of the three methods of scor- 
ing may be employed, with a slight preference for 
Method I. 

Reliability op Scoring 

The usefulness of any measuring device depends to 
a considerable degree upon its reliability, In this ex- 
periment the method of computing reliability was 
limited to correlating half vs. half or odd, vs. even 
scores, since all data from any subject were collected 
at one sitting. Although a short practice series was 
given, there was definite adjustment to the experimental 
situation during the early trials. It was decided, there- 
fore, to compute reliability by correlating the sums of 
the odd vs. the sums of the even scores. The Brown- 
Spearman “prophecy” formula was then applied to 
obtain the reliability of the complete test. The raw and 


■'The -wiitcr has In progress an experiment to compare the statisti- 
cal limen with each of the methods of scoring used in the present 
investigation, 
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the raised reliability coefficients fallow for each of the 
subgroups (Group 4 is a combination of Groups 2 and 
3 )‘ 

Group 1 

Homogeneous colors, Method I r=.786, rj.=.880 
Homogeneous colors, Method II r— .798, 

Homogeneous colors, Method III r— .768, rj,=,869 
Group 2 

Heterogeneous colors (men). Method I r=.868, rx=.929 
Heterogeneous colors (men). Method II r=.866, rx=.928 
Heterogeneous colors (men), Method III r=.819, r4.=“.900 
Group 3 

Heterogeneous colors (women), Method I r=.908, /■x=.952 

Heterogeneous colors (women). Method II r=:.850, ro,=,919 

Heterogeneous colors (women), Method III r=,801, i'x='.890 

Group 4 

Heterogeneous colors (men & women) , Method I r=.860, rj,=.925 
Heterogeneous colors (men & women). Method II r^.855, r»=*.922 
Heterogeneous colors (men & women), Rlcthod III r=.805, rx=.892 

The reliability coefficients for Group I are definitely 
lower than in the other groups (5 to 9 points). The 
cause of this is probably to be found in the nature of 
the stimuli in Group 1. All letters on any stimulus 
card were of the same color. Furthermore, colors on 
successive cards followed each other in an order which 
distributed all stimulus cards of any color either in 
the odd or the even trials. Whatever effect the colors 
had on apprehension, therefore, influences the sums of 
the odd more than the sums of the even scores, or vice 
versa. This effect would tend to lower the odd-even 
correlation. 

Examination of the coefficients in the various sub- 
groups shows that Method I and Method II (in both 
of which some credit is given for correct reproductions 
that are in wrong positions in the series) have approxi- 
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mately the same reliability in each of the groups, 
although there is some variation from group to group. 
Another characteristic which is universal in every 
group is the tendency for Method III to be less re- 
liable by about 6 points than either Methods I or II. 

The relative standing in reliability of the three 
methods is made clearer by examining the scoring 
methods and by reference to some of the original data 
sheets. Both Method I and Method II make use of the 
same data in a slightly different m.'inner (full credit for 
correctly placed items plus one-half credit for those out 
of place; and full credit for all items irrespective of 
order in which reproduced) and should therefore have 
approximately the same reliability. Method III, hoAV- 
ever, ignores all letters not reproduced in correct serial 
order, Reference to the original data sheets reveals 
some rather striking variability from trial to trial in 
this method of scoring. Many instances occur in which 
the first one of two letters of a series are reproduced in 
the correct sequence and then 3 or 4 letters are repro- 
duced correctly but in a wrong order. Then in the 
succeeding series all letters reproduced (S to 7) were 
in the correct order. Many repetitions of this sort of 
thing reduces the odd-even reliability coefficient, of 
course, Logically, it would seem that Methods I and 
II are bound to yield a more adequate measure of visual 
apprehension than Method III since they give a 
weighted credit for partially correct reproduction. 

Both the raw and the raised correlation coefficients 
are high. The raw coefficients range from .768 to 
.908, with 9 out of 12 above .800. All of the raised 
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coefRcients are approximately .900, the range extend- 
ing from .869 to .952, These reliabilities are satisfac- 
tory for the group comparisons to be made in this 
investigation. 

The data on reliability plus an analysis of the scoring 
methods justify the conclusion that Method I is the 
most adequate of these three methods of measuring 
visual apprehension, that Method II is nearly as good, 
and that Method III is least adequate. The relatively 
high intercorrelations between methods, and the size 
of the reliability coefficients justify the added con- 
clusion that any one of the scoring methods may be used 
and still achieve an adequate measure of visual appre- 
hension. 

Influence of Letter Position on Visual 
Apprehension 

As every one knows, Western Europeans commonly 
read symbols (words, numbers, etc.) from left to right 
in a line of print, By the time one has reached the 
sixth grade or higher, this habit is rather rigidly fixed, 
It is only to be expected that adults, in apprehending 
tachistoscopically exposed series of characters, cither 
in sense or in nonsense arrangements, tend to read from 
left to right and to reproduce correctly more charac- 
ters at the left end of the series. The influence of 
character position on visual apprehension probably 
varies somewhat, however, with length of series as well 
as with uniformity of length of series. 

Crosland and Johnson (6) report data which in- 
dicate that, in series of letters, the position .at the left 
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end is the most favorable position in the series for cor- 
rect apprehension, and that each succeeding position 
toward the right is less favorable than the location at 
its left. The authors state that "the percentage of 
letters correct in all respects decreases gradually and 
consistently from left to right.” They employed letter 
series ranging from 3 to 10 items in length. Various 
studies of visual apprehension show that scries 3 to S 
items in length are frequently apprehended entirely 
correctly (S, 27) . 

The results of Crosland and Johnson would have 
been more adequate if they had analyzed each letter 
series of a given length by itself. Even then, the 
shorter series would not give an accurate picture of the 
influence of letter position unless the length of the 
series represented approximately the average span of 
the individual. Any test that is too easy does not 
measure discriminatively. 

The writer considers that to obtain an accurate pic- 
ture of the relative influence of letter position on range 
of visual apprehension, the letter series should be uni- 
form in length and equal or slightly greater than the 
subject's average span. If these conditions are fulfilled 
it is probable that results somewhat different from 
those of Crosland and Johnson will be found. Although 
these authors state that there was a "strong tendency 
in all Subjects to succeed in catching the last letter on 
the card” (at the right), their method of computing the 
percentage of maximum score correct does not reveal 
this trend. Also in Crosland’s supplementary graph 
(5) there appears to be too rapid a drop in the curve 
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for earned absolute scores (and also for earned per- 
centage scores) in the first few letter positions. 

Because of the inconclusive evidence available, the 
responses of 100 men and of 100 women to 24 series of 
8 letters each were scored in two different ways and the 
influence of letter position on visual apprehension de- 
termined. This was the series of heterogeneous colored 
stimuli described in Chapter I. The two methods of 
scoring used have been described above: (1) In 
Method 11, every letter reproduced correctly was 
given a credit of 1 irrespective of the order of repro- 
duction, For the present purpose each correctly 
reproduced letter was credited to the position it held 
in the stimulus series. (2) In Method III a credit of 
1 was given to each letter reproduced correctly and in 
the right place. No credit was given for misplaced 
letters. 

Since the average span of the group for these Series 
was between 5 and 6 letters, the eight-letter series was 
slightly greater than the average. For individual mem- 
bers of the group the average spans ranged from ap- 
proximately 3.5 to 7.5. Inspection of the original data 
sheets revealed that all eight letters of a series were 
reproduced correctly only infrequently. 

The average number of times that the letters occur- 
ring in each position were apprehended correctly was 
computed by scoring Methods II and III for 100 men, 
for 100 women, and for the combined group. The 
highest possible score for any position is 24, The re- 
sults for the two subgroups are given in Table I, 

Inspection of the first row in the table shows that 
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TABLE 1 

Tiro Influsncb of Letter Position on the Ranob of Visual 

Apprehension 
Highest Possible Score = 24 


too Men 100 Women 

Position in Method II Method III Method II Method III 

series Mean Mean obisj. Mean ffnijt. Mean ■(rnt»i. 



23.0 

1,5 

22.6 

2.6 

22.8 

1-6 

22.3 

1.9 

% 

22 .+ 

2.1 

20-5 

2.7 

22.2 

1.9 

20.2 

2.8 

3 

21-7 

2,5 

19.0 

3.3 

20.8 

2.4 

17.0 

3.3 

+ 

20 .S 

2,7 

17.6 

3.7 

19.8 

3.1 

U .9 

4.3 

5 

17.3 

3.3 

12.8 

4,2 

15.9 

3.9 

12.1 

4.8 

6 


^,3 

10.4 

4.8 

IS . 4 

4.4 

10,4 

S.Q 

7 

11.9 

6.3 

7.6 

5.5 

12.4 

5.2 

8.5 

5.2 

d 

12.5 

6 .+ 

9.2 

6.1 

13,6 

5.5 

10.3 

5.5 


"Letter position number 1 is fl( (lie left end of the aeries. 


letters in position I are apprehended correctly approxi- 
mately 95% of the time (score of 24 = 100%), 
irrespective of scoring method or the sex of subjects. 
Changes in scores with changes in letter position reveal 
similar trends for men and for women, but the scores 
for women are slightly smaller than for men except 
in positions 7 and 8 where they are slightly greater. 
In general, there is in both methods of scoring a de- 
crease in score from each letter position to the succeed- 
ing one through position 7. In position 8 there is 
always a slight increase in average score. 

When results for the men and the women arc com- 
bined the average scores for succeeding letter positions 
are; 


Letter Position 

1 

2 

3 

4 

5 

6 

7 

8 

Method II 

22.9 

22.3 

21,3 

20.2 

16.6 

15.6 

12.1 

13.1 

Method; III 

22.5 

20,3 

18,4 

17,2 

12.4 

10.4 

8.1 

9,7 


There are variations in constancy of trends from one 
scoring method to the other. These trends are best 
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ihown by noting the decrease or increase in average 
!COre from one letter position to another in each method 
3 f scoring. Differences obtained by subtracting the 
score in one position from that in the preceding are 
given below for the combined group (100 men plus 
100 women) as well as for each sub-group (to be read; 
score in position I minus score in position 2 equals 0.6, 
etc.) ; 


Differences between i 

1-2 

2-3 

3-4 

4-S 

5-6 

6-7 

8-7 

Men, Method 11 

0.6 

0.7 

1.2 

3.3 

1.4 

3.9 

+0.6 

Women, Method 11 

0.6 

1.4 

1.0 

3.9 

0.5 

3.0 

+ 1.2 

Men, Method III 

2.1 

1.5 

1.4 

.4.8 

2.4 

2,8 

+1.6 

Women, Method III 

2.1 

2.3 

1.0 

4.8 

1.7 

1.9 

+1.8 

M -h W, Method II 

0.6 

1.0 

1.1 

3.6 

1.0 

3.5 

+0.9 

M-hW, Method III 

2.2 

1.9 

1,2 

4.8 

2.0 

2.3 

+ 1.6 


In either method of scoring these data show the 
same tendency with men subjects, with women sub- 
jects, and with men and women combined. This 
demonstrates a consistency of general trend from group 
to group. The following discussion is based on the 
results for the combined group {M + W ) . 

While both methods show a decrease in score at each 
succeeding position from position 1 through 7 and then 
an increase at 8 (indicated by d'), the amount of 
change is always less in Method II, with the single 
exception of sixth to seventh position where it is 
greater in Method II. This difference in amount of 
change is probably due to method of scoring. It will 
be remembered that Method II gave credit for repro- 
duced letters irrespective of order of reproduction. 
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Table 1 shows that scores in II are slightly greater than 
in III for every letter position. It is evident that 
ability to reproduce letters in their correct sequence 
decreases more rapidly from position to position than 
ability to reproduce letters irrespective of sequence. 

In Method II there is a slight decrease in score from 
position 1 to_2, 2 to 3, and 3 to 4. From 4 to 5 there 
is a decided decrease, more than 3 times the preceding 
amounts, Then comes a lesser decrease from 5 to 6, 
another large decrease from 6 to 7, and, finally, a small 
increase from 7 to 8. In comparison with II, Method 
III reveals much larger decreases from position I to 2, 
2 to 3j 4 to 5, and S to 6 and a greater increase from 7 
to 8, Positions 6 to 7 show, however, a smaller de- 
crease in Method III. Both methods show a marked 
drop in score from position 4 to 5. The same is true for 
position 6 and 7. The amount of change in score from 
positions 1 to 2 and 7 to 8 contrast markedly in the two 
methods of scoring for, as shown above, the decrease is 
much less in Method II. 

The findings in this study differ somewhat from those 
reported by Crosland and Johnson (6) and by Qrog- 
land (5). The former reported that “the percentage 
of letters correct in all respects decreases gradually and 
consistently from left to right” Curves based on their 
data, but printed in the latter study show a rather rapid 
drop from position 1 through 5, and then a much 
slower drop from position 5 through 10. The rapid 
drop in the first half of these curves (5, p. 377) ap- 
parently is a function, in part at least, of the length 
of the stimulus series. The inclusion of series of 3, 4, and 
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5 letters with the longer ones, as stated in another dis- 
cussion (27) undoubtedly favored the accumulation of 
high frequencies of correct responses in the first 4 or 
S positions in the letter series. Such an effect would 
increase the rapidity of the drop in the first part (left 
half) of the curves. 

In the present investigation, curves drawn from the 
data in Table 1 would contrast markedly with those of 
Crosland. Although the drop in average score in our 
study is somewhat more rapid in Method III than in 
II, both show a consistent and very gradual drop from 
letter position 1 to 4. From position 4 to 5 and 6 to 7 
occur marked drops followed by a rise from 7 to 8. 
While the exposure interval in the present experiment 
was 3 seconds and in Crosland’s l50a, the writer con- 
siders the differences in results to be largely due to the 
use of stimulus series of various lengths in the latter 
study and series of a constant length, which was slightly 
above the average span of the subjects, in the other in- 
vestigation. 

The change in average score in successive letter 
positions in our study may be explained as follows: 
The accuracy of apprehension of the first 3 or 4 letters 
in any series was high for all subjects. This, combined 
with the habitual tendency to apprehend any series of 
characters from left to right, led to a rather gradual 
decrease in letters correctly apprehended. Since the 
average for the group lies between 5 and 6 letters, a 
concentration of individual average spans occurs 
around 5. This led to an increased concentration of 
mistakes at position 5. In position 6, which is also 
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close to the group average, ooe would expect only a 
few more mistakes than at 5, Position 7, hosvever, is 
definitely beyond the span of all but a very small per- 
centage of the group, and consequently another marked 
drop In average score is discovered. The slight in- 
crease of correct reproduction of letters In position 8 is 
due to a tendency manifested by many of the subjects 
to apprehend the last item of a series even though they 
had missed two or three letters preceding it. Evidently 
the end position favors more accurate reproduction 
than the adjoining location just within either end of the 
series. 

One is justified in concluding that letter position has 
a definite effect on visual apprehension. From left to 
right there is a decrease in the average number of let- 
ters correctly reproduced in each succeeding letter 
position through the seventh and then a slight increase 
in the last position. The decrease from the first to the 
fourth is constant and gradual. There are rapid drops 
in score from the fourth to fifth and sixth to seventh 
positions, and always an increase at the eightli position. 
These produce marked irregularities in the consistency 
of trend from position to position in the series. Both 
absolute and relative variability of scores increases con- 
sistently from letter position I through 8 (Table 1). 



EFFECT OF COLOR ON APPREHENSION 
OF LETTERS 


In studying the effect of color on the apprehension of 
letters printed on a uniform background, two arrange- 
ments of stimuli must be considered: (1) scries in 
which all letters on a single stimulus card are of the 
same color, i.e., homogeneous colors; and (2) variation 
of color from one letter to another on the same stiimilus 
card, i.e., heterogeneous colors. There is considerable 
difference between the hvo situations. In the former 
no difference between hue or brightness of color occurs 
from symbol to symbol in any single series of letters. 
There is with each succeeding symbol in the latter, 
however, a simultaneous change in both hue and 
brightness. Influence of color on apprehension may 
be different in the two situations. 

The abject of the investigation reported in this 
chapter was to determine the effect of both homogen- 
eous and heterogeneous colors on visual apprehension 
of letters when background of stimuli is constant 
(white) 

In the analysis of results, color preference, attention 
value, and luminosity, as well as hue of the colors, have 
been considered as possible factors involved in produc- 
ing differences in apprehension. 

®Wlien the color of both backgiauncl and symbols are v.iried .1 
somcwJKJt different situation is produced. Tlic effect of color com- 
binations on perception lias been invcstif>ntcd in iinotJier c^;pcrimcnt 
whicli will be reported in Chapter IV, 

[ 85 ] 
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Homogeneous Colors 

Iti this part of the study, 32 series of colored stimuli, 
each containing 8 letters, were read. There were 4 
series of each of the following eight colors: violet, red, 

TA13LE 2 

Visual Approhension of HoMocnNHOusLY Colorbd Stlviuli 


N = 100 (50 men + 50 women) 

Color 

Rank 

Mean 


Violet 

1 

6.50 



0.D9 

Ked 

2 

6.43 

0.09 

Green 

3.5 

644 

0.10 

Gray 

3.5 

6.34 

o.u 

Ojrnnge 

5 

64 i 

0.10 

^lock 

6 

5.97 

0.09 

Blue 

7 

5,93 

0.03 

Yellow 

S 

5.70 

0.09 

Average* 

— 

6.22 

0.07 


•This overage woa computed from the original tltita. 

green, gray, orange, black, blue, yellow. One hundred 
subjects (50 men and SO women) observed in the ex- 
periment. The procedure has been described in 
Chapter I. 

The average score (span) of each subject for 
each color was computed by Scoring Method II 
in which each letter correctly reproduced was 
given a score of one, irrespective of the order of repro- 
duction. Each group mean, therefore, is an average 
of 400 readings. In scoring for effect of color on ap- 
prehension, it is necessary to give equal weight to each 
reproduced letter, as is done in Scoring Method II, 
As noted in Chapter II, when all 32 series of stimuli 
are considered, this method of scoring has a reliability 
coefficient of .89. When only 4 series (one color) are 
employed, the consistency of performance is consider- 
ably lessened. This reliability is represented by a co- 
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efficient of .55, which is large enough, however, to jus- 
tify group comparisons of the data involved. 

Table 2 contains the basic data for visual apprehen- 
sion of homogeneously colored letters. Column 1 shows 
the various colors used, Column 2, the relative rank 
of mean span of apprehension listed from largest to 
smallest, Column 3, the mean number of letters cor- 
rectly apprehended per exposure, and Column 4, the 
standard errors of the means. The latter are all com- 
paratively small and show little variation from color 
to color, the range being from 0,08 to 0.1 1. 

Examination of Column 3 reveals that the mean 
spans fall into 4 groups. At the top of the list stand 
violet and red with average spans of 6.50 and 6.43, re- 
spectively. In the next lower group, green, gray, and 
orange have approximately equal spans. The means 
are 6.34, 6.34, and 6.31, respectively. There is then a 
considerable gap to black and blue, which stand close 
together with average spans of 5.97 and 5.93. Finally, 
yellow stands at the bottom of the list with a span of 
only 5.70. The last row of the table gives the total 
average of 6.22 for all colors (32 series per subject) , 

The ranking of the colors in Column 2 is apt to be 
somewhat misleading unless the differences between 
averages are considered. Since green and gray differ 
from each by only .003, they have been given the same 
rank, but orange, which differs from cither green or 
gray by only .03 receives a different rank. The ranges 
of differences between mean spans for each color in 
comparison with the others arc given below: 
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Violet .07 to ,80 Ornngc .03 to .61 

Red .07 to ,73 Black ,04 to .53 

Grccfi .003 to ,64 Blue ,04 to .57 

Gray .003 to .64 Yellow *23 to .80 

An adequate interpretation of these differences rests 
on a knowledge of the ratios of the differences to the 
standard error of the differences. These ratios are 
listed in Table 3. The formula for correlated measures 
was employed to compute the standard errors of the 
differences since the spans for the various colors corre- 
lated with each other, These coefficients of correla- 
tion ranged from ,40 to .66, with an average of ,SS. 


TABLE 3 
D 

The Ratios tor the DipruRSNCBS bbtwbbn tub Averaobs 


Of 

D 

Given in Table 2 



Violet 

Red 

Green 

Gray 

Orange Black 

Blue Ycllo'V 

Violet 


0.93 

1.9+ 

1.56 

2.25 

6.S5 

6,77 

>.8S 

Red 

0.93 


1.03 

0.02 

1.34 

5.S8 

S.+3 

S.23 

Gxeen 

1.94- 

U03 


0.03 

0.33 

4.78 

4,77 

e.74 

Gray 

1.56 

0.32 

0.03 


0.23 

3.27 

4.00 

5,94 

Orange 

2.25 

1.34 

0,33 

0,23 


+.45 

4.27 

7.07 

Biact 

6.S5 

5.58 

4.7t 

3.27 

4.45 


0.40 

3.17 

Blue 

6.77 

5.43 

+,V7 

4.00 

4.27 

O.+O 


2.83 

Yellow 

9M 

8.23 

6.7+ 

5.94 

7.07 

3.17 

2.83 



Seventeen of the 28 differences between means are 

D 

of relatively high significance. Sixteen have a 

a 

D 

of 3.17 or greater and the seventeenth has a ratio of 2,S3. 
One important trend in Table 3 should be noted. The 
average span for yellow, which is by far the smallest 
of the eight, consistently shows a statistically very re- 
liable difference from all the other means. Black and 
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blue are the only other two means which reveal a large 
percentage of highly reliable differences. Each of the 
others (violet, red, green, gray, and orange) have only 
three differences of relatively high statistical reliability. 

D 

A difference had to be about .25 or above for the 

a 

n 

to equal 3.00 or more. There are, as demonstrated by 
these results, significant differences between apprehen- 
sion scores of certain colored letters. 

Among the factors which might have influenced ap- 
prehension of the colored letters, color preference, at- 
tention value, and luminosity should be listed. If the 
rankings for preference and attention value obtained 
with other subjects than those employed in this experi- 
ment are accepted as approximately valid, comparisons 
may be made with our data. Ranking for color prefer- 
ence was obtained from a combined group of 100 men 
and 100 women (see HeUrogeneons Colors below), 
and that for attention value from a group of 1 19 men 
and women (Adams, 1, pp. 118-119). These rankings, 
together with that for luminosity [determined by De- 
Camp's method (7)], are given in Table 4. 

TABLE 4 

Rankings for Spans of Colored Letters, Color Preference. 

Attention Value, and Luminosity of Colors 


Span 


Preference 

Aiicntioji 

V 

Luminosity 

Violet 

(1) 

Blue 

(1) 

Orange 

(1) 

Yellow 

(1) 

Red 

(2) 

Green 


Red 

(2) 

Green 

(2) 

Green 

3,5) 

Red 


Blue 

(3) 

Orange 

(3) 

Grny 

(3.5) 

Of nngc 

<4) 

Black 

(-0 

CJrny 

( + ) 

Ornngc 

(5) 

Violet 

(si 

Green 

(5) 

Blue 

(5) 

Black 

(6) 

Yellow 

(6) 

Yellow 

(6) 

Violet 

(6) 

Blue 

(7) 

Black 

(7) 

Violet 

(7) 

Reel 

(7) 

Yellow 

(3) 

Grny 

(8) 

Gray 

(8) 

BInck 

(8) 
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In the first column of Table 4, reading from top to 
bottom, are given the rankings for spans of visual ap- 
prehension for colored letters. The numbers in paren- 
theses, following the colors, designate the rank. Violet 
has the largest span and is given a rank of 1, In Col- 
umn 2, blue is the most preferred color; in Column 3, 
orange has the greatest attention value; and in Column 
4, yellow possesses the greatest luminosity, By luminos- 
ity is meant the percentage of white in the color. For 
example, white has 100 per cent luminosity; black, zero 
per cent 

A comparison of the relative positions of each color 
in the successive' columns reveals a striking lack of re- 
lationship between most of the rankings. There ap- 
pears to be a trend toward a negative relationship be- 
tween span for colored letters and luminosity, and 
toward a positive relation betvveen preference (affec- 
tive value) and attention.® The relationships are more 
adequately revealed by correlations bchvecn the rank- 
,ings. They are given below; 


Color span 

vs, color preferences 

P = +.021 

Color span 

vs, attention Viilue 

P = —.153 

Color span 

vs, luminosity 

P = —.351 

Preferences 

vs, attention value 

P ~ +.527 

Attention value 

vs, luminosity 

p =5 —.190 


The first correlation coefficient of ,021 shows that no 
relation exists between size of span for letters of a cer- 
tain color and preference for that color. 

There appears to be a very slight tendency for colors 

®Ojily slight sex differences exist in either attention value or nffec- 
live value of colors. The ranking for men correlates +.762 with 
that for women for the former and +.976 for the latter (sec Hetero- 
geneous Colors below). 
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of high attention value to coincide with colored letters 
having a short span of visual apprehension. The co- 
efficient of — .153 is so small, however, that it could be 
more adequately interpreted as signifying no relation- 
ship between attention value and span, 

The coefficient of — .351 indicates an appreciable ten- 
dency for high percentage of luminosity to be accom- 
panied by small span for letters of a given color. This 
trend is easily discovered in Table 4. Yellow, the most 
luminous color, has the shortest span. And violet, 
which has the greatest span, is sixth in luminosity, close 
to the non-luminous end. Furthermore, red, which has 
next to the largest span, is next to the least luminous 
color. 

All the colored letters were on a white background. 
The less the luminosity of a color, therefore, the greater 
the contrast in brightness between color and back- 
ground. Yellow, with approximately 76 per cent lumi- 
nosity, shows small contrast with white, but violet and 
red, with about 8 and 5 per cent, respectively, produce 
marked contrast with the background. Apparently 
this difference of contrast in brightness between color 
of letters and background affects span of apprehension 
to a certain degree. Black (with zero per cent luminos- 
ity) and blue are the only colors markedly misplaced in 
the tendency for size of span to correlate negatively 
with luminosity. If black and blue arc excluded from 
the series, the correlation between size of span and 
percentage of luminosity becomes — .901. As one 
would expect, the greatest effects on apprehension oc- 
cur with colors of very high or very low percentage of 
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luminosity, i.e,, violet and yellow, especially the yel- 
low, where span differs significantly from all the others. 
Green, although apparently somewhat displaced in a 
rank correlating negatively with luminosity, has an 
average which differs from orange by only .03 of a 
letter. Its actual displacement is therefore slight, 

One is justified in concluding that luminosity of 
colors has some effect on apprehension of homogene- 
ously colored letters. In general, the greater the lu- 
minosity the smaller the span of apprehension. Hue 
of colors alone probably has little effect on visual ap- 
prehension score. The sijse of the spans for blue and 
black, which represent marked exceptions to this gen- 
eralization, are unexplainable, either in terms of the 
method of the experiment or from other data at hand. 
Color preference correlated with attention value 
yielded a coefficient of .527, indicating a fair amount 
of relationship. Attention value, however, correlates 
with luminosity by only — ^.190, These relationships 
will receive further attention in our discussion of 
heterogeneous colors. 

Heterogeneous Colors 

In this part of the investigation, 24 series of colored 
stimuli, each containing 8 differently colored letters, 
were read. They were the same colors employed in 
the homogeneous series. Each color appeared a single 
time on each stimulus card. Arrangement of colors, 
letters, and procedure of eyperinientation have been de- 
scribed in Chapter I. 

9pan of visual apprehension for heterogeneously 
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colored stimuli has high reliability by three methods of 
scoring, as has been shown in Chapter II. To dis- 
cover the effect of color on visual apprehension with 
heterogeneously colored stimuli, it was necessary to 
score the responses somewhat differently than with ho- 
mogeneous colors. Since each color occurred once on 
every stimulus card, and an equal number of times (3) 
in each letter position during the 24 series, the total 
number of letters reproduced of a given color will 
yield a measure of apprehension for that color. A 
comparison of these measures for the various colors 
will demonstrate any effect which color has on the ap- 
prehension of heterogeneously colored letters. 

The responses were scored for the number of times 
letters of each color were reproduced in the course of 
the 24 series. A letter was called correct, irrespective 
of sequence of reproduction. The highest possible 
score was 24, since each color appeared only once in 
each series. 


TABLE 5 

Apprehension of Hbteroceneouslv Colored LsrrERS nv 100 
Men. dy 100 Women, and by tub CoMniNP.n Group 


100 Men too Women Mcn-hWomen 


Color 

Mean 

flf 

Mean 

O' 

.If 

Mean 

<T 

Black 

1&.67 (I)» 

JO 

18.14 (3) 

.29 

18.40 (0 

.21 

Orange 

18,40 (2) 

.27 

18.20 (1) 

.27 

18.30 (2) 

.19 

Violet 

18.34 (3) 

.29 

17-77 (6) 

JO 

18.06 {4) 

.21 

Bine 

18.31 (4-) 

.29 

18.16 (2) 

,27 

18.24 (3) 

.20 

Reel 

18.23 (5) 

.29 

17.82 (5) 

.29 

18.02 (5) 

.21 

Gray . 

17.90 (6) 

.28 

17.98 (4-) 

,29 

17.94 (6) 

,20 

Green 

17.77 (7) 

.28 

17.37 (S) 

.29 

17,57 (7) 

JO 

Vcilow 

17.36 (S) 

JI 

17.53 (7) 

.U 

17.44 (8) 

.24 


*The nnmbeis jn pnreiuhcHcs indicntc ific rnnks from il\e hi^bcsi lo the 
lowcBt averago. 
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The means atid the standard errors of the means for 
the number of times letters of each color were appre- 
hended by men, by women, and by the combined group 
are given in Table S. The numbers in parentheses fol- 
lowing each mean indicate the relative rank (l=highest 
average) . Columns 3, 7 of the tabic show that vari- 

ability of the means is relatively constant from color 
to color. 

As will be shown later, there are no sex differences in 
span of visual apprehension. When the responses are 
scored for colors apprehended, however, the colors 
change tank somewhat from men to women. For men 
the rank order from highest to lowest mean score is 
black, orange, violet, blue, red, gray, green, yellow; 
while for women it is orange, blue, black, gray, red, 
violet, yellow, green. The greatest discrepancy between 
men and women, however, is only 3 ranks (violet) . The 
correlation between the ranks of men and of women is 
+.714, The rest of the discussion will be based on the 
results of the combined group shown in Columns 6 and 
7 (1^=100 men+100 women), since, in general, the 
results for both men and women reveal the same trends. 

The effect of color on apprehension for the com- 
bined group is shown by the differences between the 
means in Column 6 of Table 5. The range of these 
differences for each color in comparison with all the 
others is given below: 

Black ,10 to .96 Red ,08 to .58 

Orange .06 to ,86 Gray ,08 to .50 

Violet .04 to .62 Green .13 to .83 

Blue ,06 to .80 Yellow .13 to .96 

These differences range from .04 to ,96. While many 
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of the differences are small, some are large enough to 
yield comparatively marked stability for the direction 
of the difference. 

The significance of each difference has been evalu- 
ated by computing the ratio of the difference to the 
standard error of the difference. These ratios (for 
combined group) arc found in Table 6. The colors 
in Column 1 have been listed according to size of av- 

TABLE 6 
D 

The Ratios pok the Dipferences retwebn the Avcraoes 

a 

D 

Given in Column 6 of Taolb 5 



BInck 

Ornnge 

Blue 

Violet 

Red 

Gray 

Green Yellow 

Black 


0.60 

0.88 

1,98 

2.L7 

2,88 

4.93 

+.30 

Orange 

0.60 


0.J8 

1.42 

1.69 

2.18 

+.50 

4.20 

Blue 

0.88 

0.38 


1.03 

1.16 

2.03 

3.81 

+.11 

Violet 

1.98 

1.43 

1.03 


0.2} 

0.70 

2.79 

2.97 

Red 

2.17 

1.69 

1.16 

0.23 


0.49 

2.80 

2.60 

Gruy 

2.88 

2.18 

2.03 

0.70 

0.45 


2.23 

2.36 

Green 

4,93 

4.50 

3.81 

2.79 

2.80 

2.23 


0.60 

Yellow 

4.30 

4.20 

+.11 

2.97 

2.60 

2.36 

0.60 



erage score with the highest apprehension score (black) 
at the top, The standard error of the difference was 
computed by the formula for correlated measures. 
The ranges of the intercorrelations between the average 
number of letters apprehended in each color are listed 
below for men,, women, and the combined group : 

100 men range = .52 to .72, average = .63 

100 Women range = .42 to .72, average = .60 

100 men -|- women range = .50 to .72, average = .62 

These correlations indicate that the relative number 
of letters apprehended remains fairly consistent from 
color to color. 

The trend of magnitude of differences and reliability 





96 


QBWETIC PSYCHOLOOV MONOGRAPHS 


of differences is the same for men and women as for the 
combined group. The conclusions from results for the 
combined group, therefore, may be considered to hold 
for a group of either 100 men or 100 women. 

The most striking trend found in Table 6 is the con- 
sistently high significance of the differences between 
average scores of both green and yellow and all the 
other colors. Chances that the direction of the dif- 
ferences will be reversed on future experiments are 
very slight indeed. Approximately a chance difference 
exists between scores for green and yellow. Reference 
to Column 6 of Table S shows that yellow and green 
have markedly smaller averages than the other colors, 
and that the difference between green and yellow is very 
small (0.13). 

As with homogeneous colors, to analyze the trends 
revealed in Tables 5 and 6, one should consider the pos- 
sible effects of affective value (color preferences), at- 
tention value, and luminosity of colors on the appre- 
hension of colored letters. Color preferences were ob- 
tained for both men and women along with the visual 
apprehension scores. The orders of preference are 
given below for 100 men, 100 women, and the combined 
group of men plus women; - 



Women 

Men and Women 

Blue 

Blue 

Blue 

Green 

Green 

Green 

Red 

Orange 

Red 

Orange 

Red 

Orange 

Violet 

Violet 

Violet 

Y cllow 

Yellow 

Yellow 

Black 

Blficlc 

Black 

Gray 

Gray 

Gray 
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Blue was most preferred by both men and women, 
and by the combined group. There are practically no 
sex differences in affective value for these colors (Mil- 
ton Bradley’s standard colors). Red and orange are 
the only colors which occupy non-corresponding ranks 
in the two lists. The correlation between the ranking 
for men and women is +.976. This cocflicient is also 
a measure of the stability of the ranking for the com- 
bined group. The orders of preference shown above 
are approximately the same as those given by other in- 
vestigators, Color preferences are stable and lasting 
and apparently are not dependent to any marked degree 
upon the temporary condition of the person making the 
choice.^’’ 

For convenience of comparison the eight colors arc 
ranked for average number of letters apprehended, 
preference, attention value, and relative luminosity in 
Table 7. Black had the largest average number of let- 

TABLE 7 

Rankings for Average Number of Letters Apprehended^ 
Color Preference, Attention Value, and Luminosity 

OF Colors 


Apprehension 

Color 

Attciilion 

Ucinlive 

lumlnosiry 

avernge 

preference 

vnJnc* 

Black 

Blue 

Orange 

Yellow 

Orange 

Green 

Red 

Green 

Blue 

Red 

Blue 

Oranf^c 

CJray 

Violet 

Ornngc 

Black 

Red 

Violet 

Green 

Blue 

Cray 

Yellow 

Yellow 

Violet 

Green 

Black 

Violet 

Red 

Yellow 

Orny 

Gray 

Black 


•Adams (I, Table 4). 


^®For a siiininary of results on affective value of colors see Poffeii 
berger (20, pp. 438-447). 
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ters apprehended, blue was the most preferred color, 
orange had the highest attention value, and yellow is 
the most luminous color. 

Certain trends arc noticeable from mere inspection 
of the successive rankings. There appears to be no re- 
lation between the rankings for apprehension and for 
preference. Apparently, however, there is some cor- 
respondence between ranking for apprehension and at- 
tention value, for there are only slight differences be- 
tween ranks in the two scries for orange, blue, green, 
yellow, and gray. But between ranking for apprehen- 
sion and luminosity the relationship appears to be an 
inverse one as indicated by the positions of yellow, 
green, and black. These relationships are given mathe- 
matical expression in the correlations which follow. 
Coefficients of correlation are given for men and for 
women as well as for the combined group of men plus 
women. Similar trends in all subgroups will allow 
conclusions to be drawn with greater confidence, 


Colors apprehended vs. color preferences (rnen) 
Colors apprehended vs, color preferences (women) 
Colors apprehended vs. color preferences (M-j-W) 
Colors apprehended vs. nttcnli'on value (men) 
Colors apprehended t>r. attention value (women) 
Colors apprehended vs, attention value (M-l-W) 
Color preferences vs. attention value (men) 
Color preferences vs. attention value (women) 
Color preferences vs, attention value (M-f-W) 
Colors apprehended wr. luminosity (men) 

Colors apprehended vs. luminosity (women) 

Colors apprehended vs, luminosity (M-)-W) 
Attention value vs. luminosity (men) 

Attention value vs, luminosity (women) 
Attention value vs. luminosity (M-f-W) 



—.071 

-b.095 

-.024 

-f.429 

-{-.190 

-f.524 

-f.524 

+.667 

+.527 

—.690 

—.337 

—.667 


P - —.119 
p = —.071 
P =: —.130 
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Color preferences vr. luminosity (men) p = 

Color preferences w. luminosity (women) p r= -j-,105 

Color preferences vs, luminosity (M-^W) p = 4'‘995 

The coefficients of — .071, 4-. 095, and +.024 demon- 
strate a complete absence of relationship between ap- 
prehension of colored letters and color preference. 
Immediate memory (visual apprehension) of colored 
letters, therefore, is not affected by the pleasant or un- 
pleasant feeling tones which usually accompany the 
perception of the colors. 

It is often stated that one tends to recall the pleasant 
more readily than the unpleasant. There is, however, 
considerable disagreement concerning the influence of 
feeling tone on memory. Tait (24) found that pleasant 
colors were recognized more frequently than un- 
pleasant or indifferent ones. Although our results do 
not agree with Tail’s, they arc supported by the experi- 
mental findings of Gordon (9, 10) and Anderson and 
Bolton (2). In her first study, where the procedure 
was similar to ours, Gordon attempted to determine 
whether the pleasantness or unpleasantness accompany- 
ing the perception of simple geometric figures and 
colors has any influence on the accuracy of memory for 
these experiences. The observers, after viewing more 
or less complex figures for three seconds, described the 
pictures immediately after they were removed from 
vision. Each figure was classed as pleasant, unpleasant, 
or indifferent. In another part of the experiment, the 
observers viewed for one second figures consisting of 
9 colored squares. At the end of the exposure the ob- 
server reported the affective value of the object and 
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named the colors seen. The Immediate memory of 
both geometric figures and colored objects was practi- 
cally the same for pleasant, unpleasant, and indifferent 
figures, In delayed memory (3 weeks) recognition was 
equally good for figures of all degrees of pleasantness. 
The conclusion is that there is no direct influence of 
feeling tone on memory. 

In her second study, Gordon discovered no relation- 
ship between affective value and immediate memory 
for odors. The correlation between ranks for affective 
value and memory was — .07. Anderson and Bolton 
also employed odors for stimuli and found no signifi- 
cance between immediate memory of pleasant and un- 
pleasant stimuli. Many other experiments have studied 
the influence of feeling tone on memory for various 
types of experiences. Certain investigators claim to 
have demonstrated the presence, others the absence, of 
such influence. 

The present study corresponds most nearly to Gor- 
don’s first experiment, The principal difference was 
the determination in our experiment of a rank order 
for the affective value of colors appearing on the stimu- 
lus cards. Our results correspond to those of Gordon 
and others who have found no influence of affective 
value on immediate memory, 

The coefficients of +.429, +.190, and +.524 between 
apprehension score and attention value demonstrate 
that attention value is potent to some degree in deter- 
mining apprehension of heterogeneously colored let- 
ters. The tendency is for colors of greater attention 
value to have larger apprehension scores on the aver- 
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age. Reference to Table 7 reveals that both orange and 
blue rank high in apprehension score and attention 
value, while both yellow and gray rank low. 

Rankings for color preference correspond somewhat 
to those for attention value. The correlations arc 
+.524, +.667, and +.527, showing that most preferred 
colors tend to have greater attention value. Hence, 
while related to attention value, color preference does 
not influence apprehension of colored letters, although 
attention value apparently does to a certain degree. 

In general, the greater the luminosity of a color the 
smaller the apprehension score. This is shown by the 
correlations between luminosity and apprehension. 
The coefficients are — .690, — ,337, and — .667. This re- 
lationship is apparently due to the variation of bright- 
ness contrast between colored letters and white back- 
ground (cardboard), The brightness contrast ranges 
from 100 per cent for black letters on white background 
to approximately 24 per cent for yellow on white. The 
large amount of contrast for black letters on white 
cardboard results in the apprehension of the largest 
number of letters. On the other hand, the relatively 
small contrast between either yellow or green in com- 
parison with the white results in a much smaller appre- 
hension score than with any of the other colors (see 
Tables 5 and 7) . 

Although there is less relationship between appre- 
hension scores and both attention value and luminosity 
for women than for men, the effect of this sex differ- 
ence is not apparent with the combined group of men 
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plus women. The larger group yields practically the 
same correlation as the men. 

Since the correlations between attention value and 
luminosity are so small, there is probably little or no 
relationship present. The coefficients arc — .119, 
— .071, and — .130. Similarly, color preferences appear 
unrelated to luminosity. The coefficients are +.095, 
+.105, and +.095, 

The above analysis of the data presented in Tables 
5, 6, and 7 leads to the conclusion that at least two 
agents are operating to produce differences in the av- 
erage number of letters apprehended for various colors 
in heterogeneous color series. These factors are the 
relative luminosity and the attention value of the colors. 
There Is a tendency for less luminous colors to have 
higher apprehension scores. This is due apparently to 
the brightness contrast between color and white back- 
ground. An important exception to this trend is the 
score for orange. Orange has relatively high luminos- 
ity which means small brightness contrast between color 
and background, find also a high apprehension score. 
The large apprehension score of orange is to be ex- 
plained In terms of attention value, since it has greater 
attention value than any other color and there is an 
appreciable correlation between attention value and 
apprehension score for colored stimuli. The affective 
value of colors has no influence on apprehension of 
colored stimuli (i.e,, on immediate memory for briefly 
exposed objects). 
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Homogeneous versus Heterogeneous Colors 

The above discussion indicates that color has a some- 
what different effect on visual apprehension with het- 
erogeneously than with homogeneously colored stimuli. 
With homogeneous colors, the effects appear to be in- 
consistent to a considerable degree. The spans of 
apprehension for violet, red, green, gray, and orange 
group themselves at one end of the distribution with 
relatively large scores; then come black and blue with 
decidedly smaller spans; and, finally, yellow is at the 
bottom with a very low apprehension score. The one 
distinguishable trend in the data is the slight negative 
correlation between luminosity of colors and appre- 
hension score, i.e,, the greater the luminosity the smaller 
the span. This is especially noticeable for the j^ellow 
letters which have the highest luminosity by far, and a 
span which is much smaller than those for any other 
colors. This trend is partly nullified, however, by the 
low apprehension scores for black and blue stimuli. 
These positions of black and blue, between yellow and 
the other colors, arc inconsistent and apparently unex- 
plainable in terms of luminosity, affective value, or 
attention value of the homogeneously colored letters. 

There appears to be no relationship between appre- 
hension scores in heterogeneous and homogeneous 
colors. Black, orange, and blue, which held positions 
1, 2 and 3 in heterogeneous colors, had positions 6, 5, 
and 7, respectively, when the colored stimuli were 
homogeneously arranged. The only point of agree- 
ment is that yellow produced the lowest apprehension 
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score in both kinds of series. The correlation between 
ranking of homogeneous and heterogeneous colors is 
— .029, We noted above, however, that, if black and 
blue were omitted from the homogeneous series, there 
would be a high negative correlation ( — .901 ) between 
span and relative luminosity of colors. In such a case 
there would be an appreciable correlation between the 
rankings of apprehension scores in the two kinds of 
series. 

Consideration of the results from heterogeneously 
colored stimuli, however, reveals more consistent trends 
than were found with homogeneous colors. Both at- 
tention value and relative luminosity affect apprehen- 
sion of heterogeneously colored letters to a considerable 
degree. The consistent trends in the results are un- 
doubtedly influenced by the heterogeneity of the colors 
in the stimulus series. With eight different colors on 
each stimulus card the situation is more favorable for 
relative attention value to become effective than when 
each card has all one color which changes from card to 
card (successive series). In any heterogeneous series, 
therefore,- a letter whose color has high attention value 
is apprehended more frequently than the letters whose 
colors possess less attention value. 

In like manner, the relative luminosity of heterogene- 
ously colored letters appears to have more effect than 
that of homogeneous colors on visual apprehension. 
With several colors of varying luminosity opportunity 
arises for letters differing in relative brightness to com- 
pete with each other in visibility due to the brightness 
contrast between colored letter and white background. 
It has been shown above that there is an appreciable 
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negative correlation between luminosity of colors and 
apprehension score. Apparently, this relation is caused 
by the difference in brightness contrast just described. 
This effect of relative luminosity is evidently different 
from attention value for no correlation exists between 
the two. 

In a situation where letters of different colors arc 
simultaneously exposed as in the heterogeneous color 
series, the various colors apparently exert a differential 
effect on apprehension (immediate memory) of the 
letters. It is possible that this f actor also influences the 
total number of letters reproduced correctly after the 
exposure of any series. It will be remembered that in 
the homogeneous series this variation of color from 
letter to letter was absent. A comparison, therefore, 
of span for heterogeneous with that for Jiomogcncous 
colors should reveal whatever influence heterogeneity 
of the colors has on total span. These comparisons are 
found in Table 8. 


TABLE 8 

Effect of Homogeneous versus KETEiioaBNEOus Colors on 
Span of Visual Apprehension 


Homogeneous colors HclcrogcncouB colors 


Scoring 

method 

Group 

Mean 

(T.U 

Group 

Menn 

fT 

M 

U* 

D 

a 

D 

I 

100 M + w 

5,75 

,08 

100 M 

5.56 

.09 

+ .19 

1.63 

I 

100 M + W 

5.75 

,08 

100 w 

5,45 

.09 

+.30 

2.55 

11 

100 M + W 

6.ZZ 

.07 

100 M 

6.08 

.08 

+.1+ 

1.23 

n 

100 M + w 

6,22 

,07 

100 W 

5.98 

,08 

+.2+ 

2,1 B 

III 

100 M + W 

5.27 

.03 

100 M 

4.97 

.09 

+.30 

2.35 

III 

100 M + W 

5.27 

.08 

100 W 

4.98 

,10 

+ .29 

2.19 


*A plus liicUcnlcB Hiat the difTcronce Is in fnvor of the liomogcncous colnra, 

As stated before, the observers for homogeneous colors 
were a mixed group with the same number of men and 
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■women. For the heterogeneous colors, however, there 
were two subgroups, of 100 men, and 100 women, re- 
spectively. Before comparisons can be made between 
spans for the heterogeneous and homogeneous series, 
therefore, one must know whether sex differences occur 
in visual apprehension of colored letters. In Column 
6 of Table 8 are found the mean scores of both men 
and women by the three' methods of scoring. The av- 
erage spans for men and for women were S.56 and 5.4S, 
respectively, in Scoring Method I ; 6.08 and S.98 in 
Method II; and 4.97 and 4.98 in Method III. The 
magnitude of the differences between these average 
scores of men and women are very small and statistically 
insignificant. For all practical purposes, the scores 
for men and women are approximately identical. 

The comparison between span for homogeneous and 
heterogeneous colors is made by subtracting the mean 
score in Column 6 from the mean score in Column 3. 
The obtained difference is found in Column 8, and the 
ratio of the difference to the standard error of the dif- 
ference in Column 9. A plus sign before any differ- 
ence indicates that it is in favor of the homogeneously 
colored letters. Examination of the differences between 
all groups and by all methods of scoring reveals a range 
of +.14 to +.30, and shows that every difference is in 
favor of the homogeneous colors. While the direction 
of any one of these differences does not have great 
stability, the fact that all differences are in the same 
direction warrants the tentative conclusion that hetero- 
geneity, in comparison with homogeneity of colored 
stimuli, tends to shorten the average span of visual ap- 
prehension to a slight degree. 



IV 

RELATED INVESTIGATIONS 

There are a number of studies whose results have 
either a direct or an indirect bearing upon the influ- 
ence of color on visual apprehension and perception. 
The purpose of the present chapter is to analyze these 
results in sufficient detail to yield, when correlated with 
the present investigation, a clear picture of the various 
factors operative in detennining the apprehension and 
perception of both chromatic and achromatic stimuli. 

Apprehension and Perception of Colored Stimuli 
Hart (12) determined the range of visual attention, 
cognition, and apprehension*' for colored stimuli. He 
calculated limens for each of four subjects in each of 
two series of stimuli. Red, blue, green, and yellow dots 
(Piering papers), pasted in homogeneous scries on gray 
backgrounds, served as stimuli. I’hc results for the 
four subjects have been combined into one group and 
the average limens ranked in Table 9. Results for all 

TABLE 9 

Ranks of Limens (l=LAaOEST Limen) for Range of Atten- 
tion, Cognition, and Apprehension of Colored Stimuli 
(from Hart) 

Auention CoKiiumn Apprelicnsion All 

Color I II Hoih I II Hoik 1 II Hoth tlime 

Red 1 1 I 1 1 I I 1 L 1 

Blue 323 222 222 2 

Green 44 1 + + + 433 4 

Yellow 2 3 2 3 3 3 3 4 4 3 


^^Threc typical processes, as determined from introspective reports, 
were first classified as ‘‘attention," "cognition," and "apprcliensiini*' 
by H. S- Obei’ly ( 18 ). Since each category involves apprehension, all 
arc inclqdcd In our summary table- 

[ 107 ] 




108 


GENETIC PSVCHOLOGV MONOCRAPHS 


degrees of assurance arc included (Harris 5 + + -H 3 
data). The table should be read as follows: for the 
category "attention,” in Scries I, the ranks of the limens 
are; red = 1 (largest), blue = 3, green “ 4. yellow 
= 2. Similarly for Series II, and both (I plus II) com- 
bined. The results may be stated in Han's words : "In 
every case, red in general gives a high limen and green 
a low one. Yellow and blue have a tendency to be in- 
termediate and variable. This situation is in general 
true for all 3 systematic categories and for all 3 de- 
grees of subjective assurance . . While the individual 
results showed considerable variation, the summary of 
group results given in Table 9 reveals a rather marked 
tendency for the colors to take the ranks : red ( 1 ) , blue, 
yellow, green. 

Hart rejects intensity (luminosity) of the colors as 
an explanation of the differences obtained but suggests 
that differences in lag of visual .sensation {Auklingen 
times) for the various colors is a more probable cause, 

In considering the relative brightness of his colors, 
Hart has failed to evaluate the brightness of any color 
in relation to its background. The background was 
gray, but no statement concerning its brightness is 
given, It is probable that some of the colors (red and 
blue) were darker and the others { yellow and green) 
brighter than the background. It is also possible that 
red showed the greatest, green the least, and yellow 
and blue a moderate amount of brightness contrast be- 
tween color and background,'* If this be true, and the 

^“Pcrccitixge luminosity for standard Hcring papers arc approxi- 
mately; red, 5,2; l)lue, 14.7; green, 44.3; and yellow, 75.7. If tlie 
neutr.il gray employed ns background had 40 to 50 per cent of lumin- 
osity, the above statements are valid. 
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evidence is in its favor, then brightness or luminosity 
contrast between color and background does offer a 
satisfactory explanation of the differences between 
limens for apprehension of the colored stimuli. As 
will be shown later, retinal lag appears to be intimately 
related to intensity or brightness of the stimulus. 

When one color is printed upon another as a back- 
ground tlie legibility of the type may become very poor 
due to low visibility of the letters. Luckiesh (IS, pp. 
246-2S1) reports results for 13 combinations of colored 
print and background for printed matter read from a 
distance. The order from most to least legible 
follows ; 


1. Dl.'ick printed matter on 

2. Green " " on 

3. Red '* " on 

4. Blue " " on 

5. White “ " on 

6. Black " on 

7. Yellow " '■ on 

8. White " "on 

9. Wliite " “ on 

10. White " " on 

Jl. Red ■■ " on 

12. Green " " on 

13. Red “ " on 


yellon' bnckfrround 
white " 

white " 

white “ 

blue 
white 

black “ 

red 

ttrecn " 

black 

yellow " 

red 

ereen " 


Details of the experimental conditions such as kind 
of ink and paper employed, size of type, text used, etc,, 
are omitted from the report. This is unfortunate be- 
cause, when a white background is specified, one should 
know whether pure white, light gray, or light cream 
color is meant. Many kinds of print paper that are 
called white are really a light gray, and others arc 
cream colored (light buff). Inspection of ibc above 
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list of color combinations, however, shows that the 
combinations involving the greater contrast between 
print and background, as black on yellow, blue on 
white, etc., are among the most legible, TJiosc with 
small brightness contrast, as red on green and green on 
red, possess very poor visibility. The reason why 
black on white is in the middle of the series is not clear. 
This should have the greatest luminosity or brightness 
contrast. Perhaps the white background was not 
pure white, or a glare frona the white background may 
have reduced the visibility of the black print. In 
general, however, Luckiesh’s results show that legi- 
bility of the printed material is largely determined by 
the luminosity difference between color of print and 
color of background. 

In a more carefully controlled experiment, Tinker 
and Paterson (29) studied the influence of variations in 
color of print and background on speed of reading. 
The Chapman-Cook Speed of Reading Tests, in ^vhich 
score on Form B is equal to score on Form A when 
typography is identical in the two forms, were utilized 
as the measuring instrument. Form A was printed 
with black ink on white Rainbow cover-stock and 
served as a standard in each comparison. Form B was 
printed with Ruxton’s colored ink on Rainbow cover- 
stock in 10 variations of color combinations. Scores 
were obtained from 850 subjects. When score on Form 
B (color combination) was compared to score on Form 
A (black on white) and the color combinations ranked 
for influence on speed of reading, the order from most 
to least legible was found to be : 
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1. Black print on white background 

2. Green " on white " 

3. Blue " on white " 

4. Black " on yellow " 

5. Red *' on yellow '' 

6. Red " on white " 

7. Green " on red " 

8. Orange " on black " 

9. Orange " on wliitc " 

10. Red " on green 

11. Black " on purple 

Black on white, which has the greatest luminosity 
contrast between print and background, is distinctly the 
most legible text. Green on white, blue on white, and 
black on yellow, are all only slightly poorer than black 
on white. Red on yellow and red on white possess fair 
visibility, but all the remaining combinations (7 
through H) arc poor. In the light of their results, 
Tinker and Paterson, in recommendations for hygienic 
printing, formulate the following rule; "In combining 
colors (color of ink and paper) care must be taken to 
produce a printed page which shows a maximum 
brightness contrast between print and background." 

In the Tinker and Paterson experiment the measure- 
ment was in terms of speed of reading. To obtain a 
non-speed measure of the influence on legibility of 
variations in color of print and background, Preston, 
Schwankl, and Tinker (21) determined the effect of 
color combinations on the perception of isolated words. 
The greatest average distance from the eyes at which 
words printed in any color combination could be read 
constituted the measure of legibility for that combina- 
tion. Black print on white paper was employed as a 
standard to compare with each of 10 other coinbina- 
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tions. The color combinations produced marked dif- 
ferences in comparative legibility of print. They are 
listed below, in order of legibility, the most legible 
combination in rank 1 : 


1. Blue print 

on 

white 

baclc^*round 

2, Black 

fl 

on 

yellow 

ff 

3, Green 

(1 

on 


II 

4, Black 

ft 

on 

white 

II 

5, Green 

ff 

on 

reel 

fl 

6. Red 

ff 

on 

yellow 

If 

7. Red 

It 

on 

white 

tr 

8. Orange 

ff 

on 

blitck: 

II 

9. Black 

II 

on 

purple 

It 

10. Orange 

ff 

on 

white 

ir 

11. Red 

If 

on 

green 

If 


These are the same color combinations used by Tinker 
and Paterson in the above-cited experiment. In both 
these studies, a more adequate interpretation of results 
is made possible by a knowledge of the appearance of 
the color combinations to an observer. The combina- 
tions, printed in Ruxton’s ink on Rainbow cover-stock, 
are listed below with a description of the appearance 
of each in parenthesis after the trade name (color of 
backgrounds approached maximum saturation.) : 

'Trade Name Observed Effect 

Black jobbing on white (BKack on light grayish white) 

Grjiss green on white (Dark green on light grayish white) 
Lustre blue on white (Dark Wue on light grayislv white) 

Black jobbing on yellow (Black on yellow with slight orange 

tinge) 

Tulip red on yellow (Light retl on yellow with slight orange 

tinge) 

Tulip red on white (Light red or\ light grayish white) 

Grass green on red (Dark grayish green on red) 

Chromium orange on bl,ick (Dark lemon yellow on dark grayish 

black) 

Chromium orange on white (Light orange on light grayish white) 
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Tulip red on green {Dark brown on dark green) 

Black jobbing on purple (Black on dark purple) 

A comparison of the observed effect with the ranking 
of the above list for comparative legibility reveals the 
fact that, in general, the greater the luminosity dif- 
ference between print and background, the greater the 
legibility of the color combination. The ranking for 
comparative legibility obtained in this study correlates 
+.864 with the ranking given by Tinker and Paterson 
who used a speed measure. This indicates that, 
whether measured in terms of speed of reading or per- 
ceptibility distance, the all-important factor condition- 
ing perception of words is the brightness contrast be- 
tween print and background. 

With a somewhat different technique, Miyake, Dun- 
lap, and Cureton ( 17) determined the relative legibility 
of black and colored numbers on colored and black 
backgrounds. In one scries, single numbers were typed 
in black on red, green, yellow, and white backgrounds; 
and in a second series, red, green, yellow, and white 
numbers were printed on a black background. Fifteen 
subjects read the numbers from tachistoscopic ex- 
posure. The average number of times each color com- 
bination was read correctly determined the legibility 
ranks which follow, the most legible color C(iinbination 
receiving a rank of 1 : 

Sews J 

1. BJack pWnt on vvJiite backfiround 

2. Black “ on yellow 

3. Black on j^recn ” 

4. Black '' ini veil 
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Series Jl 

I. White print on blatk bacIcKf^uncl 

2* Yellow on bUck " 

3. Green ** on bUck '' 

4. Red ** on black “ 

These rank orders show that the combination of colored 
print and background which produced the greatest 
luminosity difference was found most legible, i.e,, 
black on white in Series I, and white on black in Series 
II. In general, as the luminosity difference between 
printed numbers and background decreased, the legi- 
bility of the color combination decreased. 

This survey and analysis of the investigations con- 
cerning the influence of color combinations on visual 
perception and apprehension in reading reveal a uni- 
form trend common to all. In the first place, hue of 
the color apparently has little or no effect on percep- 
tion or apprehension. The all-important factor seems 
to be the luminosity difference between the symbol to 
be apprehended and the background upon which it is 
printed. The data indicate this to be true for (I) the 
accuracy with which colored dots on a gray back- 
ground are apprehended, (2) the speed with which 
material printed with colored ink on colored paper is 
read, (3) the distance at which words printed in colored 
ink on colored paper are perceived, and (4) the ac- 
curacy with which colored numbers on colored or 
white paper are apprehended. 

Apprehension and Perception ov Achromatic 
^ . Stimuli 

There have been several investigations on apprehen- 
sion and Perception of achromatic stimuli in which 
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variation in luminosity contrast between character and 
background occurred. Cooper (4), employing the 
systematic categories (attention, cognition, and appre- 
hension) used by Flart (12), determined limens for 
black, dark gray, and light gray stimuli which con- 
sisted of paper dots pasted on a white background. 
The limens for the 4 subjects have been averaged and 
ranked in Table X, Results for all degrees of assur- 
ance are included (Cooper’s 5+4 + 3 category). 
•Under the category "attention,” in Series I, black had 
the largest limen (rank 1), dark gray the next largest, 

TABLE 10 

Ranks of Limens (1 = Larobst Limkn) for Range of Atten- 
tion, Cognition, and Apprehension of Black, Dark 
Gray, and Light Gray Stimuli (from Cooper) 

Aiicittion CvRnitian A|iprclicnsi<>n Al) 

Stlniulim I II Holh 1 11 Boih 1 II Unili ihrce 

Black i i i 121 111 1 

Dark ^roy 2 3 2 3 3 3 2 2 2 2 

Light gray 3 2 3 2 1 ^ ^ ^ ^ 

and light gray (he smallest. The other columns are to 
be read in a similar manner. Examination of the group 
trends in this table reveals that black has the largest 
limen in all but one category (Series II, "cognition”). 
The ranks for dark gray and light gray vary consider- 
ably in "attention” and "cognition” limens, but con- 
sistently hold ranks 2 and 3, respectively, in "apprehen- 
sion.” The general trend for all three categories taken 
together (last column) shows black to hold rank 1, dark 
gray, rank 2, and light gray, rank 3. Cooper points 
out that stimulus intensities (brightness) do not appear 
to have any constant effect for the results of individual 
observers. In general, however, under the category 
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“appreheasioft” the darker stimuli produced larger 
litnens, even with individual subjects. Summing the 
data of all Subjects into one group accentuates the uni- 
formity of results under the category “apprehension” 
and reveals a somewhat less consistent general trend 
running through all the data. The greater the bright- 
ness contrast between stimulus and background, the 
larger the limen, i.e., there is a negative correlation be- 
tween luminosity of stimuli and size of limens, 
especially under category “apprehension.” 

Employing a somewhat different technique, Taylor 
and Tinker (2S) determined the spans of visual appre- 
hension for black, dark gray, and light gray letters on 
a white background. A total of 128 subjects observed 
in the experiment which was composed of a homogen- 
eous scries of letters in which all letters on any stimulus 
card were of the same brightness, as all black, etc., and 
a heterogeneous series in which each succeeding letter 
on any stimulus card varied in brightness, as black fol- 
lowed by dark gray, etc. The average scores of appre- 
hension in each series follow; 


Homogeneous Series 
Black Av. =: 24.63 
Dark gray Av. = 24.70 
Lfglit gray Av, = 23, 8S 


Heterogeneous Series 
Black Av, = 25.24 
Dark gray Av. “ 24.50 
Light gray Av, ~ 21.06 


Since the background was white, the greatest 
luminosity contrast between symbol and background 
occurred with black letters, next greatest with dark 
gray, and least with light gray. The above averages 
show that, in the homogeneous series black and dark 
gray yield approximately equivalent scores; but light 
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gray has a definitely smaller score. In the heterogen- 
eous series more definite trends are noticeable. The 
average score for black is definitely the largest, dark 
gray next, and light gray the smallest by far, In both 
series the difference between light gray and either dark 
gray or black has rather high statistical significance, 
especially in the heterogeneous series. In general, the 
results indicate that the greater the luminosity dif- 
ference between symbol and background, tlie greater 
the average score of apprehension. 

Ferree and Rand (8), in their series of experiments 
on intensity of light and speed of vision, have investi- 
gated the comparative effects for dark objects on light 
backgrounds and light objects on dark backgrounds. 
Broken circles subtending visual angles of 1 to 3 
minutes were used as test objects. TClcvcn degrees of 
illumination intensity, from 1.25 to 100 foot-candles, 
were employed, They determined the shortest ex- 
posure during which a symbol could be distinguished 
accurately, The shorter this time (faster the vision) , the 
more legible the symbol for any given illumination in- 
tensity and size of test object. Four combinations of 
brightness were used in constructing the test objects: 
black on white, white on black, white on gray, and 
black on gray, Below are listed the brightness com- 
binations from fastest (most legible) to slowest speed 
of vision for each size of test object : 

Visual angle eqaais 1 tninule 

1.25 to 5.00 foat-cniKlIcs : bl.tck on wJiitc /nstest, M’liitc on 
, bincic nc.xt, ivliite on Kr.iy slowest. 

7.50 to 100 loot-cjimllcs: irbitc on block f.isicst, bl.ick on 
wliite ne.xt, wliitc im ura)' iic.M, black on uray slowest. 



118 


QBMBTIC PSYCHOLOOV MONOOHAPHB 


yisual angle equals Z mlnules 

1.25 to 100 foot'Cftodlcs : white on bincif fastest, black on 
white next, white on gray next, black on gray slowest. 
Visual angle equals 3 minutes 

1.25 foot-candles! white on binck fastest, white on gray 
next, black on white next, block on gray sloweit. 

2.50 to 10-00 foot-candles; white on black fastest, black on 
white next, white on gray next, black on gray slowest. 

15.00 to 100 foot-candles: white on black fastest, white on 
gray next, black on white next, black on gray slowest. 

In considering the above results the following state- 
ments of Ferree and Rand should be kept in mind: 
"There is a greater difference in sensation between 
object and background in case of white on black than 
black on white, due probably to physiological induction 
or contrast’* when determinations of speed of dis- 
crimination are made well above the threshold of 
acuity. “The gray background used is nearer to the 
black than to the white test-letter in sensation value. 
There is, therefore, greater contrast between object and 
background in case of the white than the black test- 
- letter.” 

In general, the above rank orders of brightness com- 
binations for speed of vision take the positions; white 
on black, fastest, black on white next, white on gray 
next, and black on gray slowest.’® This rank order 
corresponds exactly with the magnitude of luminosity 
difference between symbols and backgrounds. The 
greater the luminosity contrast between test-object and 
background, therefore, the faster the speed of vision 
(greater the legibility). 

^®The fact that black on white has a faster speed thim wltlte on 
black for low intensities with small test objects is explained by ilic 
authors in terms of the nature of the test object. 
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In a related study, which may be interpreted in the 
same way as the above, Kirschmann (14) found that 
both block letters and geometric figures were more 
easily read in white print on black background than in 
black on white. The measure of legibility was the 
angular distance from the fixation point at wliich the 
letters and figures were recognized correctly in indirect 
vision. 

Starch (22, pp. 668-669) reports an experiment in 
M'hich the problem of brightness contrast is studied in 
a normal reading situation. Two pieces of text were 
set up exactly alike, except that one was printed in 
black type on a white background, and the other in 

white type on a dark gray background. These selec- 

tions were read by 40 subjects at their natural rate. The 
average number of words read per second were; 

Black on white 6.06 

White on dark gray 4.26 

These results show a difference of 42 per cent in favor 
of the black type on white background. The text show- 
ing the greater luminosity contrast between print and 
background yielded the fastest reading. 

In a somewhat similar experiment, Paterson and 
Tinker (19), using the two equivalent forms of tl\e 
Chapman-Cook Speed of Reading Tests with 280 sub- 
jects, compared the speed of reading black print on 
white background with white print on black back- 
ground. They found a 10.5 per cent difference in 
favor of the black on white printing arrangement. 

Holmes (13), employing, as a measure of legibility, 
the distance from the eye at which a symbol could be 
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read correctly, compared the perceptibility of isolated 
words printed in black type on white background with 
words printed in white type on black background. 
Averages from 20 subjects showed a 147 per cent ad- 
vantage for the black type on a white background. The 
results of the last two investigations appear to contra- 
dict the previously cited findings of Ferree and Rand. 
There is urgent need of further study of the legibility 
of black versus white print in the normal reading situa- 
tion. There are suggestions of factors other than 
brightness contrast, which may cause the differences in 
favor of the black on white in perceiving printed words. 
A study, now in progress at Minnesota, is making an 
analysis of the influence of size and nature of the 
printed symbols, and other factors which may influence 
the perceptibility of black and white print. 

This summary and analysis of the experimental re- 
sults dealing with effect of brightness combinations on 
visual apprehension and perception in reading show an 
almost universal trend. The most important factor 
conditioning perceptibility of printed characters ap- 
pears to be the luminosity difference between the 
symbol to be apprehended and its background. The 
cited results show this to be true for: (1) the accuracy 
with which black and gray dots on a white background 
were apprehended; (2) the accuracy with which black 
and gray letters on a white background were appre- 
hended; (3) the speed with which text printed in black 
on white and white on gray was read; and (4) the 
speed with which a test object in black on white, wliite 
on black, black on gray, and white on gray was dis- 
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criminated. 'I'lic few exeeptiuns appear to be ade- 
quately e.xplaincd as due to other factors such as nature 
of the test object or printed character, si/c «)f printed 
symbtd, or intensity of illumination. 

Lac? oi- VisrAi, Sknsatu>n 

Several investigations have concerned themselves 
with the inllucnce of brightness (luminosity) and color 
on the lag of visual sensation. Data reported in these 
experiments yield indirect evidence concerning the in- 
fluence of dilTcrence in luminosity contrast between 
symbol and background tm visual apprehension. 

In one of iltc most carefully controlled e.vpcriment8 
in this field, Bills (,a made determinations by pre- 
viously used methods as well as hy an improved inetlujd 
devised at the Kryn Mawr laboratory. In her own 
study she deicrminctl the lime in seconds retjuired for 
sensation to rise to its maximum value when the stimuli 
(red, ycllrtw, green, blue, an«l white lights) were all 
photometrically equal; and also when different lumin- 
osities of the same color were employed. In Bills' e.x- 
perimcntal situation, the lights to be judged appeared 
on and completely covcrcil a small white surface. This 
white surface rested on a black velvet b.ickgroiind. The 
intensity of the light stimulus, therefore, was the same 
as the amount of contrast between the white surface and 
the black background. Any change in intensity of tlie 
lights also pn)duct'(i a change in luminosity contrast 
between the illuminated surface and its background. 

Bills’ result!?, hy Method are summarized in Table 
II. The lights cmpl<»yc<i to produce the sensatuuts 
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Under investigation were mnde constant at either .057, 
or .ISO, or 1.21 meter-candlcs. The durations of visual 

TAULE II 

Timb in Suconds Rbquircd kor Skks.^tion to Risk to its 
Maximum Wbub (prom Hii,us, Mrriioo.l) 

Time of IflK when nf IiMiU ifi nicfer- 

cnndlcia U coniitaMi ai: 


Color used 

.0S7 


1.50 

1.2t 

Red 

(1,164 


0.148 

0.100 

Yellow 

0.103 


0.086 

Q.oas 

Oteco 

0.190 


0,146 

o.nii 

Bliiir 

0.210 


0J34 


White 

0.2 L 6 


nJJl4 

0-105 


lag are given in Columns 2, 3, and 4. For c.xample, it 
took the red sensation with a photometric value of ,057 
mcter-candles 0.164 seconds to rise to its nia.'^innini 
value; of .151 meter-candles, 0,148 seconds; and of 1.21 
meter-candles, 0.100 seconds. The data in this table 
reveal an almost universal tendency for an increase in 
brightness (brightness = brightness contrast) to in- 
crease the speed with which a sensation rises to its 
nTaxiimim value. The single exception is yellow, 
whose visual lag is approximately the same for 1.21 
and ,150 meter-candles. "With increase of intensity 
of light,” therefore, "there was a decrease in the time 
required to produce the maximum response.” 

Many other studies cited by Bills yield abundant 
evidence which supports her findings. li^xner, Lough, 
Martins, Broca and Sulzcr, McDougall, and Buch- 
ner‘‘ all found that the time required for visual sensa- 
tion to reach its maximum value varied inversely with 

*'‘Scc Jlills (3) for a cojn|>rclicnsive roview of tlie.-tc f.viuTioKTU.i! 
results. 
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the intensity of the stimulus light. The results of 
McDougall (16) are typical and are listed below: 

Intensity values Scns.'ition lag 


1 vmit 

0.049 sccoiul 

t/2 

0.055 

1/4 

0.061 

1/8 

0.066 

1/16 

0.078 

1/32 

0.089 

1/64 

0,100 

1/128 

0,127 

1/256 

0,142 

1/512 

0,150 

1/1024 

0,183 

1/2048 

0.200 


These results of McDougall reveal a constant increase 
in duration of visual lag with decrease in intensity of 
white sensation. 

The consistency of the trend of evidence in these ex- 
periments justifies the conclusion that the greater the 
brightness contrast between a spot of light and its back- 
ground, the shorter the time required for sensation to 
reach its maximum value. This principle of visual lag 
appears to be, therefore, one of tlie important deter- 
minants of visual apprehension. It must be considered 
as one of the more important factors which help to ex- 
plain the effect of brightness difference between figure 
(symbol) and background on visual apprehension, 
speed of reading, perceptibility of words, and other 
similar perceptual processes. 

While the influence of brightness of stimulus on 
visual lag appears to be conclusively demonstrated, the 
evidence concerning effect of hue of color on lag is not 
at all final. Whenever the colored stimuli used arc not 
equated for brightness, experimental results show that 
the brighter colors yield least sensation lag. If the 
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colors are photometrically equal, however, varying re- 
sults have been reported. McDougall, for example, 
found that colored lights of equal photometric value 
all required the same time to produce their maximum 
effect of sensation. Bills, however, with lights photo- 
metrically equal at each of three different intensities, 
found that the times required for various colored sensa- 
tions to rise to their maximum value were all different. 
A summary of her results by method 3 arc given in 
Table ll. At present, therefore, one cannot state with 
any degree of assurance that hue of color has a consis- 
tent influence on visual lag. 

Evidence from the various sources cited indicates 
that differences in visual apprehension and perception 
of both colored and achromatic symbols arc probably 
due, at least in part, to lag of visual sensation which is 
produced by variation in luminosity contrast between 
character to be apprehended and background. 

Both direct an4 indirect evidence from a number of 
related investigations warrant the following con- 
clusions concerning comparative potency of hue and 
luminosity of color on visual apprehension and percep- 
tion of symbols : {a) Hue of color has little or no effect 
on apprehension and perception. (6) The luminosity 
contrast between symbol to be apprehended and its 
background has a large and very important influence 
on apprehension and perception, (c) Lag of visual 
sensation, which is due to brightness contrast between 
symbol and background, probably explains to a large 
degree the differences obtained in visual apprehension 
and perception of symbols varying in color and 
luminosity. 



V 

SUMMARY AND CONCLUSIONS 

Because color and color combinations are widely 
employed in situations designed to convey messages by 
means of visual symbols, there is need of adequate in- 
formation concerning the influence of color on visual 
apprehension and perception. Whenever colored let- 
ters, words, or other symbols arc used on either a white 
or a colored background in printing advertisements, 
constructing automobile license plates, and the like, 
there is danger that the words or other symbols may 
lack adequate visibility. Other things being equal, the 
color combinations which favor quick and accurate 
apprehension of printed characters in all perceptual 
situations should be chosen. With a knowledge of the 
comparative legibility of symbols involving various 
color combinations, the printer will be able to make 
use of colors to obtain affective value or attention 
value, and, at the same tinne, maintain adequate visi- 
bility of textual material. 

In this investigation the effect of color on visual ap- 
prehension and perception has been studied by an 
analysis of the apprehension scores for (1) homogen- 
eously colored letters in which all symbols on any 
stimulus card were of the same color, and (2) hetero- 
geneously colored letters in which each succeeding let- 
ter on any stimulus card was of a different color. The 
influence of affective value, attention value, and 
luminosity of colors on apprehension and perception of 
symbols were included in the analysis, Other features 

[ 125 ] 
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of the experiment included determinations of (1) the 
effect of scoring methods on average span of visual ap- 
prehension and on reliability of scores, and (2) the 
influence of letter position on visual apprehension. 

There were eight letters on each stimulus card, and 
32 cards in the homogeneous (“I- of each color) and 24 
cards in the heterogeneous scries. The eight colors 
employed were: black, orange, violet, blue, red, neutral 
gray, green, and yellow. A group of 100 university 
students (SO men plus SO women) were subjects for the 
experiment with homogeneous colors, and 100 univer- 
sity men and 100 university women were subjects for 
the investigation involving heterogeneous colors. The 
exposure interval was three seconds for all scries. 
Color preferences were obtained for the eight colors 
by the method of paired comparisons. 

To determine the influence of scoring method on 
span of visual apprehension the responses were scored 
in three ways: (1) average span computed from scores 
in which a credit of one was given for each item re- 
produced correctly and in the right place, and in addi- 
tion a credit of one-half was given for each item 
reproduced correctly but out of place in the scries 
(Method 1) ; (2) average span calculated from scores 
in which each item correctly reproduced received a 
credit of one, irrespective of whether it was or was not 
in the right place (Method II) ; (3) average span de- 
rived from scores in which a credit of one was given 
for each item correctly reproduced and in the right 
place, but no credit for items reproduced correctly but 
in a wrong position (Method III), 
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Scoring Method Itl yielded the smallest average 
span of 4.98 letters; Method I came ne.xt with a score 
of S.4S; and Method 11 produced the largest span 
which was 5.98. The directions of the differences be- 
tween averages are very stable. The difference between 
either I and II or II and III is apprtJA'imatcly onc- 
half an item; and between I and III, about one item on 
the average. 

Scores in Method I correlated very high with both 
II and III, the cocflicienls being .920 and ,973, respec- 
tively. The correlation of .916 between II and III is 
nearly as high. 

These results justify the conclusion that, when it is 
desirable to know absolute span of apprehension, 
method of scoring is important. If only relative span 
(position of individual span in the group) is wanted, 
however, any one of the three methods of scoring may 
be employed, with a slight preference for Mctliod I. 

Reliability was computed by correlating the sums 
of the odd versus the sums of the even scores and then 
applying the Brown-Spearman “prophecy" formula. 
The raw coefficients ranged from ,786 to .908 in 
Method I; from .798 to .866 in Method II; and from 
.768 to .819 in Method HI. Metiiod 1 and Method II, 
therefore, have approximately the same reliability, 
which is slightly higher than that (ff Method IJ I. All 
reliability coefficients arc relatively higli, which in- 
dicates that all three methods of scoring yield results 
with high internal consistency. Although Method I 
appears to be slightly mrjre satisfactory, any one oi the 
three scoring methods may he cmploye<l and still 
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achieve an adequate measure of visual apprehension. 

Letter position in a stimulus series was found to have 
a definite effect on visual apprehension. From left to 
right there was a decrease in the average number of 
letters correctly reproduced in each succeeding letter 
position through the seventh, and then a slight increase 
in score at the last position. The decrease from the 
first to the fourth position was constant and gradual. 
There were rapid drops in score from the fourth to 
fifth and from sixth to seventh positions. These rapid 
drops together with the increase in score at position 
eight produced marked irregularities of trend from 
position to position in the series, 

In the homogeneous color series neither affective 
value nor attention value of colors influenced appre- 
hension of letters. Luminosity of colors, however, had 
some effect on apprehension. In general, the greater 
the luminosity of the colored letters, the smaller the 
perceptual span. A marked example of this tendency 
was the consistently low score for yellow letters. Two 
striking exceptions to this generalization were the low 
spans for black and blue, both of which have a low 
percentage of luminosity. The correlation between 
luminosity and apprehension score is — .351, but this 
becomes — .901 if the results for black and blue arc 
omitted. 

The results for the heterogeneous color scries re- 
vealed more consistent trends. There appeared to be 
only slight sex differences in apprehending colored 
letters or in color preferences. In the analysis of the 
differences in scores for the various colors, affective 
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value of color showed no correlation with apprehension 
score. The greater the attention value, however, the 
greater the apprehension score for that color since at- 
tention value correlated +.524 with apprehension 
score. Luminosity showed a definite effect on appre- 
hension of heterogeneously colored letters. The corre- 
lation of — .667 shows that, in general, the greater the 
luminosity the smaller the score. These effects of at- 
tention value and the relative luminosity are apparently 
independent of each other, for their intercorrelation 
is — .130. 

The heterogeneous arrangement of colored letters 
permitted the influence of lunninosity on visual appre- 
hension to become more definite and consistent than 
with homogeneous colors. Since all colored letters 
were on a white background, the general trend in both 
arrangements of stimuli was for a larger apprehension 
score to occur with the greater luminosity contrasts 
between color and background. 

Span of visual apprehension for colored letters in 
the heterogeneous arrangement of stimuli was slightly 
smaller than in the homogeneous series. 

There are three types of related investigations : ( 1 ) 
apprehension and perception of colored stimuli; (2) 
apprehension and perception of achromatic stimuli; 
and (3) lag of visual sensation. 

A survey of the first group of experiments revealed 
that hue of color apparently has little or no effect on 
apprehension and perception. The important factor 
influencing apprehension and perception in reading 
seemed to be the luminosity diflference between the 


I 
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symbol to be apprehended and the background upon 
which it was printed. This was found lo be true for 
(1) the accuracy with which colored dots on a gray 
background were apprehended; (2) (he speed vsdth 
which material printed with colored ink on colored 
paper was read; (3) the distance at which words 
printed in colored ink on colored paper were per- 
ceived; and (4) the accuracy with which colored num- 
bers on colored and white paper were apprehended. 

The summary and analysis of studies dealing with 
the effect of brightness combinations on visual appre- 
hension and perception in reading showed again that 
the luminosity difference between the symbol to be 
apprehended and the background upon which it was 
printed was an important determinant of apprehension. 
This was found to hold for (1) the accuracy with 
which black and gray dots and letters on white back- 
grounds were apprehended; (2) the speed with wluch 
text printed in black on white and white on gray was 
read; and (3) the speed with which a test object in 
black on white, white on black, black on gray and 
white on gray was discriminated. 

Evidence from several experimental studies indicates 
that differences in visual apprehension and perception 
of both colored and achromatic symbols are probably 
due, at least in part, to lag of visual sensation which 
is produced by variation in luminosity contrast be- 
tween character to be apprehended and background 
upon which it is printed. 

Both direct and indirect cvicicncc from a number of 
related experiments warrants the following conclusions 
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concerning the comparative potency of hue and 
luminosity of colors on visual apprehension and per- 
ception of symbols: (1) Hue of color has little or no 
effect on apprehension and perception except with 
heterogeneous color series in which attention value of 
colors is one important determinant of apprehension. 
(2) The luminosity contrast between symbol and back- 
ground has the greatest influence on apprehension and 
perception. (3) Lag of visual sensation, which is due 
to brightness contrast between symbol and background, 
probably explains to a large degree the differences 
obtained in visual apprehension and perception of 
symbols varying in color and luminosity. 

Results obtained in the present experiment and those 
reported in related investigations show close agreement. 
The all-important determinant of visual apprehension 
and perception of printed words, letters, and similar 
symbols is the luminosity contrast between character 
and background. 

These results have a direct bearing upon special 
printing situations such as advertising and posters 
where colors are employed for attention value, affective 
value, and the like. In any situation of this kind where 
quickness of perception is essential, care should be 
taken to use a color or brightness combination which 
produces a inaximum brightness contrast between 
symbol and background. 
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L^EFPET OE LA COULEUR SUR l/AlTRi^JlUNSlON ETV LA 
PERCEinriON ViSUKLLE^ 

DfldB Ifi panic princlpalc clc deilc etiqii^lc, on a rcHel dc la m«1cur 

fiur rappr^licnsion cl h perception viqucKcb au moyirn dune anal>(^c dc^ 
rdsuUaiB tic I'appr^Jicnsion (1) du cai dw letinri cwlor^rs oij 

toua (cB aymlmlcfl sur unc carte qucIcMiPtuc nervam de nn( dc 

la mcme coulcur, ci C^) ntJ cas lies IcUrcr colan^ci h^irroKcnen mi rhaque 
iettre suivqiUe hut unc carle quclconquc acrvani litj fccimuluq a ^te li'uoc 
coulcur difFirenic. D'lmiren pttnlcs dc Pcxptricncc ont enmpris Ics d^icr- 
mintttions (1) dc I'cffct Jea mflfiodea i|*6valufliion lur ('(^(rnduc moycnric 
de 1’ apprehension viaucllc ti aiir In cun^iancc dca rtsiiliais, ci (21 dc Piii- 
flueiice dc ifl posiiion dca Iciirci aur t'appr^hcnMOrt visucllc, Tagr tidier* 
miner rinflucncc dc la mdiltodc d'evaluation stir r^tcnduc dc I'apprehen^ion 
visiicllc, on a ^valud Ics r6ponacB acton irois meihodca. La M^ihoiic d'evalu- 
atioh IIL I'on a doimd dcs poinis pour dcs parties rcproduiien en l't»rdrc 
qu^if Inut, a momr£ la phit petite 6tendue moyenne dc I'appr^ltciisioii; la 
M^ihode 1, Qu Ton a donp^ an dcmi-polni de plus pnur thaque parlie re- 
produiic cn l^ordre incorrect, a cu unc dicnduc moina peiite; ci f» Mdtliade 
11, I'on donri6 dcs points pour touUs Ics panics nTproduliea sant £j^ard 
de l^ordrc, a eti la pins grande ^tendiie. Ln diifdrencc entre la ^^£thcJdc I 
et U Mfilhode II, ou crtlrc le^ M^thodes U ct 111, a ^t£ d*^ peu prim une 
dcmi-panic; ct entre I ci II environ unc panic cn moyenne* Touics (es 
itiRliodes out monird unc haute corrdlaiion Tuiic nvee l*au(rc. On conclut 
qiie In methode d^dvaluation a un effec. imporinm sur rdlemtuc nUsoiuc mail 
pen d’influcncc sur Tdicnduc relative dc rapprihensmn viaucHc. 7ouics lea 
m^thodcB d'^vnluailon oni cu une con&iancc rcfnilvemeiu haute j ninU fa 
constaiicc dc U M6th(}dc lil n un peu mtiins haute que Ich cfinBianrcs des 
Mdihodes I el II, fesquettes oni M npproximaLlvcmcrn ^alca Tunc A 
I'auLrc. On a irouvA nue In posiiion dei leiirca dans unc servant dc 
fitimuius a un elTct dAiinl sur Tappri^hcnajon vfsuellc. De A ilroiJe 

11 a cu une dAcroissonce du nombre rnoyen tie lcure« currecirrneiii re- 
produltes dans chaque position suWamc cles leitfca. jusqu'A In sepd^mc, et 
puls une petite croissnnee A la dernlArc posiiioiu Dniis lea s^ricfi de rmilcurs 
homog^neii ni U vnlcur nITcclivc ni In v&lcur dcs coukurs pour rnuctuioti 
n'ont inllui sur VapprAhenston des lctircs< Plus In luminnsitg det Iclirea coL 
ordea cst grande, cependant, plus Pdtenduc de la perception est petite. Pans 
Ics sdrica dc coulcurs hdidrogAnca la vnlcur ndeciivc des cuukiirs n'a iiinittrA 
Queune correlation avee lea rAaultata de rapprAhensian; iiiflU la vnlcur imiir 
I'otlention n donnd unc correlation positive cl la luminos'ud une cnrrdlnticm 
negative avec Ics rdaultnia dc rappr^henaion dans cciic disposition do’t cou- 
loura, L’^vidence de plualcurs diudcs expdrimcntalcs indlquent que Ics 
difldrencca clans ['apprdiicnsioii ct la perccpiion visucllcs des aymholcj 
colocda et nchromatiqiies scraient dues, du moins cn partlc, nu relanl dc la 
acnsiitlon viauaiJc Icquel cst produil par la vnriniion ilnns le cohiraHtc dc la 
luminosity entre la Ictirc it appryhondcr cL Ic fond sur icquel cHc eat iin- 
primye. Lcs rAsujtmg oUicnus dans ccUc experience ci ceux rnpport^a dans 
lea cnqii^tca pitrcillea Bonr tris on accords Lc dAtcrmin&ni le plus impor- 
tant clc I'appryhcnsion ct dc la perception vkuelics dcs riioia, i\t^ IrttrcH, ct 
dca symboUs acitiblables iinprimys cst lc conirnstc dc luminosity cnirc Ic 
symbolc ct le fond, 


'riNKHH 
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die EINWIRKUNU DER FAK&E AUF niE VISUELLE AUFFASSUNa 
UNO U'AIIRNEIIMUKC; 

(Refcrat) 

In clem tUupircil dicscr Umcrsuchuu^ er(ari«€hlc man dfc EmwirkuriR 
(Icr Farbc nuf die vUnclic AufTjih^unic und \VahrncbmMr>K< ntiitcU eiher 
Anaiyac dcr crzicltcn AufTaasuiiwaxahlco CaptirchchiiiDh bci t) 

homo^cn gcfHrbteti Wuchsmben, wobei allc ^yinlHile nuf beaemderen 

Reizkaric (sUiiiuluH cord) die «clbc Farbe hailcn, tind Z) bci hcLcrORcn 
gefflrbicn IhichBiaben. wobci jede dcr mil^canundrrfolKt^bdm Owchsiabcn auf 
einer gegebcncn Keizkarie ihtc bej^andcrc Fnrbe baiic* Umcr den Eenand* 
teiicn dcr Untcrauchung fnndcn aieb nuch 1) BcAimimuMKen dcr F%inwirkufi||; 
der ZnhlberecKnungamcdiodcii (acoriiig ineihodfi) au( den miiilercn Wert 
der visuellcu AulTosfiimRsapiiniiwttiie (viaual apprebtntinn und Ruf 

die ZuverliissigktiL dcr Znblcn« uhd 2) EcstimmunKen der Ein^virkung der 
Siellung (posiiion) dcr Bucbainbc auf die viauelfe Aulfa^aung. tltn die 
Einwirkung dcr ^QlilbercchTiimgsmethndc nuf dcit miutercn Wcri der 
visucllcn AufFttsaungsapannwciic 7u bcaiimmcii, Licrechnclc man die Rcak- 
tionen (rcsponscB) auf_ drei VVeifcn, BerechnunKamelhode Hummer lUi 
womit dcr V'p. tiur diejenigen Buebstaben zugcrccbnci wurdcii, die in dcr 
rjcliiigcn Auordnung wiederhoit warden waren, ergab den klcin^icn millfcren 
Wert der AuFTniifiuugBHpannweue; die Mcihodc I, wnmit fdr jede Huch^ube 
die in falsclicr Anordnung wiederbok wurde doth cinen hnlben Tuukt 
hinzugcrcchnct; wurdc^ ergnb die n^chat-klcinfic Spann wciie; und dfc 
Metbodc II, womii dcr Vp, nllc wiederhoUc liuchsioben zuKcrecbnct wurden, 
ohne RiicksichL auf die Anordnung, ergab die grdiific Spciitnxveiie. Der 
Unicrschied zwiBcbcn den Mcihodcn [ und 11 odcr den Mcibodcn II und 
III betrug im Durcbschniu ungei'ahr cinch linibcn Punki, und zwitchen I 
und HI ungefrihr cinen Punki. Es bcbiiind cine liobc Korrelotion iinier 
alien Mciltoden, Man schlosfi bierausi daoo die Uercdinungameibodc zwiir 
nuf die abooiuic Spannweice dcr visucKen Auffaiaung cine Rurkc, auf die 
lelotivc Spannwciic ober nur einc ocliwachc Einwirkung hnne, Alic Ucrcch- 
nungBinecbodcn crwicscn cine bobc ^Tuvcriiliidigkcic, aber die 2^iivcriiiffiaig- 
keit dcr Mclbodc III war ciwaii gcringct nin die ^TuvcrldBaigkcken der 
Methoden 1 und II, die fnai gicicK wftren. Ks zcigte sich, dnao die Anord- 
nung der Buebofaben in ciucr Kerzneric cine bcoiimmtc Einwirkung nuf die 
viouclle AuiFflfiaiing aiiadbce. Von linka nneh reebu zcigic sich cine Ver-' 
minderung dcr miuleren 2^abl dcr ricliiig wicdcrboticn Biichtialicn in jeder 
siikzesfliven BdcbfltnhcnatelJung hiii durch die oicbenCe, und dnnn in der 
Jeizlcn Stellung cine gcringc Krbniicrung dcr Znhl, 

I3cc dcr bomogenen Farbcn<(crie bccindufuKc weder dcr affcklivc Wert 
nocfi dcr Airfrncrlciramkcrtawcrt (ndeiUfon value) dcr Farbcn die AuffflS' 
sung von BuebBtaben. Jc Btdrkcr, jcdoch, dcr Glanz (lurr^lridBriy) dcr fflr- 
bigen Buchstaben, dcsto klcrncr war die Spannwcitc. 7ici den hctcrogencn 
Parbensericn zeigten afTokiivc Wcrie cbenfaifs keinc BerJehung zur Auf- 
fosaungfiZRbi. Be} dicscr Anordnuug dcr Farbcn ergab nlicr dcr Aufmerks- 
amkeiisivcri cine positive emd dcr Glanz ejne negative Korrcladon mit dcr 
Auffjifisungszali). Beweis aiis inehrcren exp crimen tel lea UiHcraucbungen 
deuipt darauf hin, daaa die Uatcrscbiedc bci der vIsueHeii Au/Taaauag imd 
Wabrnclimung oowohl yon farbigen wic von oebrornatisc-bca Sj'inbolen 
wabraclielnncli wciiigNteas tdhvciae auf cine Vcrzagcrirng der vlsucllea 
Empiindiing zuriickzufiBirea i-ii, — cine VcrzUgcrung die diircb Vnriaiion ties 
Olanzkonirasicji zvviiidicn dcr ntif/ufaNsenden Uucbsiniic and dcni llinler- 
grund, worn Ilf sic gcdnicki InI, veriirsncbl ivlfd, En besielii cine ziemlicli 



136 


OWmO P3YCH0UKIY MONaaMMf$ 


gcnaue Uebcrdnitinamuhg ^ivisc^hen c|cn BcfurKten aui dicker Unlc^^uchllhg 
und den Bcfunderii die aUfl vcrwandicn (/me runch ungen gemddec wordan 
dnd, Der afi^iicrH^wichllgc Fflktor bel dcr UcviirumunK dcr vhucllcji 
AuffitB^ung vitid WfthrnehTnung gcdtnckter Wdricr, Burbiiabeti, und flbh- 
licher Symbole hi det Otan^konimiK xwlBchcn Syrnbal und HtntergrnnJ* 
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I 

INTRODUCTION 


The experimental work reported in this paper deals 
exclusively with the problem of reliability; the theoret- 
ical discussion is concerned witl) both reliability and 
validity, Both of these problems are of fundamental 
importance for a science of animal behavior — knowl- 
edge of validity to enable investigators properly to in- 
terpret whatever experimental results are secured, and 
knowledge of relialDiiitJ to enable investigators to get 
results suflicientf^ free from the operations of chance 
so that the data can illustrate whatever principles are 
at work. 

From certain standpoints it may appear that detailed 
quantitative studies such as this are premature. It is 
probably quite true, as Kohler maintains, that psychol- 
ogy is so young a science that there is still considerable 
need for exploratory experiments of a qualitative type 
and that any attempt now in many fields to secure great 
precision of measurement would result only in the cali- 
bration of experimental procedures with relatively 
slight potentialities for scientific research. It may even 
prove that, for the study of learning in animals, maze 
experiments may be quite unsatisfactory in comparison 
•with some other methods such as that of conditioned 
reflex experiments. However, several considerations 
encourage the student of the reliability of maze experi- 
ments. In the first place, various investigations such 
as Tryon’s study (33) of the inheritance of learning 
ability, Maurer and Tsai’s study (19) of the influence 

[ 141 ] 
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of Vitamin B deficiency during the nursing period on 
later learning ability have seemed to indicate that there 
arc certain valid purposes which maze experiments can 
serve, such as the study of heredity, the effect of drugs 
on learning, etc., — all of which depend upon quantita- 
tive comparisons. It remains to be demonstrated that 
maze learning is less suited for such studies than other 
types of learning experiments. In the second place, 
development of statistical practices in the field of maze 
experiments should be of advantage in other fields of 
learning study as well. In a sense, the fundamental 
statistical devices used here are not different from those 
developed relative to the problems of the reliability of 
various tests and measurements, but the details of ap- 
plication introduce a number of new and puzzling 
problems, 

The purpose of the present study, then, is to throw 
light on the problem of how to secure dependable quan- 
titative results in studies of learning, and especially in 
studies of maze learning with white rats. As a means 
to this end, the aim has been, first, to make an analysis 
of the logical and statistical problems of reliability and 
validity as they apply to maze experiments, and, sec- 
ondly, to make a critical experimental examination of 
the reliability of experiments using the multiple-T 
maze, particularly with respect to the effects of permit- 
ting various degrees of retracing and with respect to 
the feeding program used. In seeking to discover the 
reliability of this maze, reliability coefficients have been 
calculated by test and retest [a method recently used 
by Heron (6) under one particular arrangement, and 
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by Tryon (30) under another particular condition], 
as well as by several of the methods commonly used to 
measure the consistency of performance of rats within 
the training on a single maze. 

There is not attempted in the theoretical section of 
this paper a complete discussion of the logical and sta- 
tistical problems of reliability and validity, although 
such a discussion would not be out of place, since the 
past discussions of the reliability of maze experiments 
have often been very loosely conducted. The recent 
article by Tolman and Nyswander (28), however, has 
furnished an excellent critical review of many phases 
of the problem, and on these particular points I would 
only echo what they have already said. 

In the experimental work of this study and in the 
methods of treating experimental data, this study is 
closely related to those by Tolman and Nyswander 
(28), Stone and Nyswander (25), and Tryon (30). 
In its consideration of the theoretical problems of the 
reliability and validity of maze experiments, it de- 
pends not only on the work of those who have contri- 
buted to this problem directly (especially Heron, Hun- 
ter, Stone, Tolman, and Tryon), but also on the de- 
velopments of the problems of reliability and validity 
in the fields of intelligence tests and educational tests, 
where these problems have received a much earlier 
and more mature treatment. 

The study was conducted at Clark University in 
1928-1930 under the direction of Dr. W. S. Hunter, and 
with advice and suggestions from Dr. Vernon Jones 
and Dr. R. R. Willoughby on a number of statistical 
points. 



II 

HISTORICAL REVIEW 

The statistical study of the reliability of experiments 
on animal behavior may be said virtually to have started 
in 1917 with Paterson’s review (22) of three articles 
by Bassett, Hubbert, and Ulrich, Paterson showed 
that, in spite of the quite positive conclusions all three 
experimenters had drawn, only Ulrich’s conclusions 
seemed justified when the reliability of the group dif- 
ferences was examined in terms of the ratio of the dif- 
ference to the probable error of the difference. Since 
the publication of Paterson’s article (though, of course, 
not primarily because of it, but because of the develop- 
ment of statistical procedures in other fields of psychol- 
ogy) an increasing proportion of studies of animal 
behavior have availed themselves of the statistical 
measures of reliability. It is still true, however, that 
too many studies are to some extent vitiated by the neg- 
lect of statistical treatment. 

One indirect suggestion in Paterson’s article was that 
the maze procedures used might be too unreliable to 
permit satisfactory measurement of any ordinary ex- 
perimental effects, and that, consequently, the first maze 
problem to attack was that of developing more reliable 
methods in maze experiments. Paterson himself began 
such experiments, but the results were not published. 
With Paterson’s permission, blunter and his students 
then took up the problem. The data from these early ex- 
periments showed very low reliabilities for current 
maze techniques, and served to awaken animal experi- 

[ 144 ] 
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menters to a realization of the fact that with the type of 
maze pattern and maze method in general use at that 
time there could be little hope of ever securing depend- 
able results. The highest reliability coefRcicnts of these 
studies were generally in the .30’s and ,40’s. These 
studies include Heron’s (4) studies of the inclined 
plane problem-box and the Watson circular maze with 
rats in 1922, Heron’s (5) study of the stylus maze in 
1924, Hunter’s (7) study of two stylus mazes with 
human subjects and of three mazes of different com- 
plexity with rats in 1922, Hunter and Randolph’s (9) 
study of the reliability of several stylus mazes and of 
the intercorrelation between records on a maze, a 
straightaway and a problem box with rats in 1924, 
Hunter and Randolph’s (10) study of the reliability 
of a very simple maze with the goat in 1926, Liggett’s 
(16) study of two simple mazes with the chick in 1926, 
and studies by Tolman (26) in 1924 and Tolman and 
Davis (27) in 1924 of several relatively simple mazes 
with rats. 

A comparison of these earlier experiments with more 
recent ones yielding higher coefficients seems to indi- 
cate that the features of the early experiments respon- 
sible for the low reliability coefficients were; (1) the 
fact that the mazes were too simple and easy, (2) the 
lack, in most cases, of preliminary training to accus- 
tom the animals to the apparatus and handling and 
to develop stronger motivation, (3) poor control of 
motivation, (4) the use, in some cases, of mazes with 
alleys of such unequal complexity that chance blun- 
derings into certain alleys offered much greater hin- 
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drances to learning than blundering into others, (S) the 
lack of means of preventing retracing, and, (6) utiliza- 
tion of too few trials to furnish the data correlated. In 
some experiments, for instance, the correlated scores 
were errors on different single trials; in other cases, 
groups of only three trials were correlated. Just as 
with psychological tests, the reliability coefficients from 
such small units are generally lower than from units 
providing more chances for success or failure. 

"Within the last three or four years a number of 
studies of maze reliability have been reported with 
much more positive results. The first of these was the 
exploratory experiment of Tolman and Nyswander 
(28) in which, of the variety of patterns used, the niul- 
tiple-T maze since used by Stone and Nyswander (25) , 
Heron (6), and by the present experimenter, yielded 
definitely the highest reliabilities. [It is interesting to 
note that in 1922 Hunter (7) had found a simple T 
maze the most reliable of the three mazes he used.] 
Stone and Nyswander (25) have given this simple T 
maze a much more extensive test in connection with 
their study of the influence of age on learning, Re- 
liability coefficients, figured on eight groups of about 
25 rats each, were of values of about .80 to .90, Some 
doubt, however, is thrown upon these figures by a re- 
cent study by Heron (6). Duplicating Stone's tech- 
nique, in general, Heron found quite as high reliability 
coefficients from correlation of different parts of the 
record of the original learning or of a relearning; but, 
on retesting his groups after intervals of 221 and 175 
days (with two groups of 36 and 54 rats, respectively), 
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he found that the correlations between, scores in the 
original learning and scores in relearning were .376=fc 
.07 and .326* .06. Within that period of time, of 
course, accidental factors may have affected differently 
the learning ability of different rats, so that perhaps the 
correlations are legitimately lower, as one might say. 
But Heron's experiment seriously raises the question as 
to whether Stone and Nyswander’s reliability coeffi- 
cients may not Jiave been so high partly through the ex- 
istence of certain factors essentially of the nature of 
systematic errors which tend to produce the high in- 
ternal-consistency correlations. For example, system- 
atic differences between the animals in feeding or 
emotional conditioning to handling and to the appara- 
tus might persist through a 30-day period of training 
and so produce a consistency of performance as be- 
tween the different parts of the original learning or of 
the second learning; but they might not persist through 
as long a rest interval as in Heron’s e.xperimcnt. 

The experiment by Tryon (30, 33rt) illustrates rather 
clearly the possibility of this source of error. On the 
first of the two mazes on which his rats were trained, 
the procedure was adopted of feeding all animals of 
the same sex the same quantity of food, regardless of 
the fact that the age range was from 3 to 8 months and 
that the food requirements of the different rats must 
have varied widely. It would seem tliat, inevitably, 
the consequence of this feeding program would tend 
to be a consistently different rate of learning for dif- 
ferent rats bdsed on the dijferences in moiivation. 

It should be added with regard to Tryon’s experi- 
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inent, however, that, with a group of 107 rats, training 
was not only given on his first (mechanical) maze, but 
also on a second (hand-operated) one; that on this sec- 
ond maze the feeding program was quite different ; and 
that, nevertheless, high correlations were found between 
the scores on the two mazes. On the second maze, 
Tryon says, . . the procedure of letting each animal 
eat until he first turned away from his food pan was 
adopted.” In spite of this change of feeding program, 
when errors from groups of three trials on the first 
maze were correlated with errors on groups of three 
trials on the second (with 20 trials given on each maze 
and the first trial and the twentieth dropped), the re- 
sulting raw correlation coefficients ranged from .318 
to .772, with the median coefficient .608. When the 
successive groups of three trials on the first maze were 
correlated with Trials 2-19 on the second, the correla- 
tion coefficients were ,470, .639, .704, .749, .773, and 
.793. The correlations between tlie successive groups 
of three trials on the second maze with Trials 2-19 on 
the first maze were .758, .709, .712, .600, .602, and .585. 
The change of feeding technique, and the 7 or 8 clays’ 
period between the two periods of trials quite materi- 
ally reduce the danger that these correlations have been 
raised by systematic errors of feeding. Nevertheless, 
one still cannot help noticing that there is a consistent 
tendency for the correlations to be higher the fewer 
the days separating the correlated trials. This seems 
rather definite proof that there must still remain a ten- 
dency for adjoining groups of trials to be affected by 
common factors other than maze learning ability, be- 
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cause it certainly would not seem that the true ability 
of these rats had changed so in the course of the 45 or 
46 days involved in these correlations.^ [This is sup- 
ported by Lasbley’s (14) findings which show that an 
increase in the time interval between tests brings with 
it an increase in the diversity of errors made and, con- 
sequently, a decrease in the errors common to the tests.] 

The other recent experiments which have demon- 
strated rather satisfactorily high reliability coefficients 
arc the studies by Husband (It), Liggptt (18), and 
Yoshioka (35, 36, 37) . Each of these experiments used 
a different style of maze, and any one of the patterns 
might well have been selected for the further investi- 
gation of this study. Tryon’s mazes, of all the group, 
seem the ones that probably offer the best features; but 
his pattern was not taken because of the fact that the 

^Since the above was written, several articles by Tryon have been 
published which deal with some of these criticisms, Thus, iii ^‘In- 
dividual differences in maze ability: II The determination of indi- 
vidual differences by age, weiglit, sex and pigmentation” (33r/)j lie 
lias shown that none of the variables discussed in the article was 
significantly related to maze performance. Several correlations of 
error scores and weight were figured for the 88 male rats used. 
The correlation between weight at the start of training and scores 
on the first maze learned was — .09, that between scores on the sec- 
ond maze learned and weight at the conclusion of training was — .11. 
I still remain skeptical, hoAVCver — partly because of the data from 
the present study that show that reliability coefficients can be raised 
by differences of feeding. Moreover, the description provided in 
these articles of the retracing doors makes the problem of differences 
in emotional reaction appear as possibly quite a significant factor/ 
The doors were treads inclined at an angle of 45“ from the floor, 
each suspended by a rubber band in such a fashion that the door 
would sink to the floor as the rat walked on it and be jerked back 
up as soon as the rnt had passed over it. Such an arrangement would 
seem to me too liable to produce a differentiating emotional condi- 
tioning in various members of the group. 
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present work was started before information on his 
study was secured. The multipJe-T maze was selected 
because of the reason that its relatively high reliability 
had been demonstrated with larger groups than were 
used with the other mazes. 

In summing up the work of the past, it clearly seems 
that with different patterns and different experimental 
procedures vastly different results as regards reliability 
can be secured. Some light has been thrown on the 
question of what types of maze pattern and experimen- 
tal procedure produce different reliabilities, but the 
more exact determination of the influence on reliabil- 
ity of different possible variations still offers a field for 
fruitful work. 



Ill 


THE LOGICAL AND STATISTICAL PROB- 
LEMS OF RELIABILITY AND VALIDITY 
AS RELATED TO MAZE EXPERIMENTS 

A. The Formulae for the Reliability of Differ- 
ences BETWEEN Group Averages 

A considerable part of the discussion about the prob- 
lems of the reliability of maze experiments has been 
concerned with the formula for the standard error of 
a mean: 

mi. 

M V'jT 

and the formula for the standard error of a difference; 

^ nr nr I + (T* — 2 Til cr (r rfl 

Ml — M, y Mt Ml Ml M, 

(or, where the data entering into the determination of 
the two means cannot be correlated ; 

‘^Mi — M, ^ ^ M, 

Apparently, Formula I is sometimes regarded as 
being a rather questionable statistical refinement of 
Formula II, as though Formula II were really the 
legitimate formula, and Formula I somewhat on a par 
with representing the index of a reliability (Vr) as in- 
dicating directly the reliability of a test. The fallacy 
of this view is well disclosed in Walker’s article (34), 
To her algebraic treatment of the problem, however, 
the suggestion might well be added that the general 
reason why Formula I (with data which can be cor- 

051 ] 
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related) is no more radical than is Formula II for iVj 
appropriate data (data which cannot be correlated) 
is this; In the case of data which can be correlated, 
errors of sampling cannot enter in to affect the differ- 
ence between means, as they can in the second case. 
Thus, suppose that our task is to discover the effect on 
error scores of doubling the feeding time at the end of 
the tenth trial. If we use the same group throughout, 
as naturally would be done, there can be no errors of 
sampling to affect the difference in means of scores of 
the trials preceding and following the tenth trial. If, 
however, we run one group for 10 trials and compare 
its record with that of a second group subsequent to 
the tenth trial, the difference between the means can be 
affected not only by errors of measurement, but also by 
errors of sampling. Hence, in this latter case, we not 
only have to use Formula 11, but we find it the correct 
formula to use. 

Regarding these formulae, the discussion has 
centered upon three main issues: (I) When the relia- 
bility of a particular instrument is low, are these form- 
ulae sufficiently conservative? (2) Is the situation quite 
the opposite, as Tryon has claimed, and is a new 
formula needed to correct for too great a conservative- 
ness with measurements from unreliable instruments? 
(3) Do maze data sufficiently conform to normality of 
distribution to cause means and ’s to be the 

1 li 

best statistical measures by which to measure the 
differences between groups? 

The discussion of the first issue mainly concerns the 
controversy between Carr (1) and Hunter (8) and 
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several of his students. The problem has been given a 
very satisfactory treatment, however, by Tolman and 
Nyswaader (28), and need not be gone through again. 
These authors point out that the problem is simply one 
of practicability — that dependable quantitative results 
can be secured with instruments of very great unrelia- 
bility, but that either the difference produced by the 
experimental factor will have to be so great, or the 

D 

groups will have to be so large, before the ratio 

’’tjitr. 

reaches satisfactory proportions, that the discovery of 
more reliable methods is an urgent problem. Of 
course, this formula is not the only safeguard needed, 
even where the data have fairly normal distribution. 
The existence of any bias in the selection of the samples, 
for instance, might produce groups yielding statistically 
different means. The point is, however, that not only 
D 

will the formula fail to detect such errors, but 

knowing the reliability coefficients of the instrument 
will not help either. Such sources of error can be 
guarded against only by the adequate control of non- 
statistical phases of the experiment. 

The second issue is the question of the merit of 
Tryon's proposed new formula (for the Vu, — Mg)> 
which has won favorable comment from at least several 
authors. The consideration of this issue has been 
rendered somewhat difficult by the fact that, in the 
course of his three articles (29, 31, 32), Tryon has 
shifted his position twice, without indicating at all 
clearly his renunciation of his earlier different posi- 
tions. He at first announced that he was demonstrating 
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that, when a certain difference was fovind between two 
means, that difference was to be taken as more signifi> 
cant the greater the unreliability of the measuring 
instruments involved. This was then renounced for the 
proposition which his statistical argument all along 
had been designed to prove, namely, that when a cer- 
tain rnlio was found between a difference and the sigma 
of that difference, that ratio was to be taken as more 
significant the more unreliable the measuring instru- 
ment. FI is third article seems to foreshadow a re- 
acceptance of the ordinary formula,^ It is the 

second of his propositions that calls for discussion here. 

In criticizing Tryon’s arguments, it is helpful to 

D 

get a picture of how the ordinary formula 

operates and of its relationship to the reliability of the 
test. First, it is to be noted that, with greater and 
greater unreliability, the difference found between two 
groups may vary more and more from the true dif- 
ference than would be the case with a perfectly reliable 
instrument. Thus, if the true difference is 4 points, 
with a measuring instrument of a reliability of .80 one 
might occasionally find differences of 2 or 6 points 

^In a. still more recent .article, Tryon writes thus; "Applied to 
maze measures, my notion was, briefly, that when a difference be- 
tween mean maze scores of two groups who differ in some systematic 
way has Aeen found to be statistically reliable by gauging it in terms 
of the orthodox P.E,nif, formula, the reliability of this difference is 
unaffected by knowledge of the reliability coefficient within each 
group’’ (33ff, p, 156). 

It is interesting to know that this is what Tryon meant by his 
earlier articles; when one reads the articles in question one tends 
rather to conclude that Tryon has rather fundamentally changed 
his opinions. 
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rather than 4; and with a measuring instrument with a 
reliability of .50, differences of — 1 and 9. This can 
be readily realized if two groups whose true means are 
absolutely equal are considered. With a perfectly 
reliable test, no difference would be found between the 
means, but with tests of greater and greater unrelia- 
bility there would tend to occur either positive or 
negative differences of greater and greater size. There 
is, accordingly, no safeguard in the numerator of the 

D 

formula to guard one from accepting as true 

those differences that tend to be farther and farther 
from the true values as the measuring device becomes 
more and more unreliable. In the denominator of the 
formula, however, there is such a safeguard — namely, 
that, as the reliability decreases, not only does the mean 
tend to deviate farther and farther (one direction or 
the other) from the true position, but also the sigma of 
the distribution tends to be increased. It is generally 
held that this size of the tends to increase with un- 
reliability at a rate which compensates for the tendency 
of the mean (because of unreliability in measurement) 
to assume positions away from the true mean. The 
essence of Tryon’s suggestion was, however, that instead 
of using the directly, in the formula, a 

formula derived by Kelley for estimating the probable 
true sigma 

a = c Vi" 

Truo dUs 

should be used, thus giving the formula; 
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The values achieved by this formula will be the 
same no matter what the reliability of the measuring 
instrument, Tryon said (31, p. 4), inasmuch as the 
formula “ . . . . contains the true sigmas of the given 
groups which arc necessarily constants, otherwise, they 
would not be 'true.’ ” Such being the case, one would 
accept a difference between means of 4 points, Jet us 
say, as being just as reliable when secured with tests 
with a reliability of .20 as when secured with tests with 
a reliability of l.OO. But, as we have seen, the more 
unreliable the test the greater is the probability that the 
found difference betrween the means may be cither 
greater or less than that which actually should be 
indicated. 

The difficulty in Tryon's reasoning was indicated 
when he specified his reasons for preferring the formula 
for the true sigma rather than the ordinary for use 
in the ^.formula. He said ; 

"The true sigma I's an indKc of the dispersion in a group 
when each individual's score contains absolutely no error 
of measurement. Obviously, this is the sigma which we 
arc fundamentally interested in and is the one to be used 
in comp.iring populations with cacli other. . . " (31, p. 2). 

The defect in this statement is that, when we are try- 
ing to determine the significance of a mean or of a dif- 
ference between means, it is not the true sigma in which 
we are fundamentally interested, but, on the contrary, 
the actual or found sigma of distribution, since it is 
only this latter sigma which has any influence in the 

formula in warning against possible displace- 
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ment of the means from their correct positions be- 
cause of the unreliability of the test. 

The third issue which has been raised relative to the 
formula for the critical ratio, and the issue most de- 
serving of further attention, is the question of the 
allowances to be made for the general tendency of maze 
data to deviate markedly from normality of distribu- 
tion. 

Paterson in 1917 called attention to the fact that the 
maze studies he reviewed contained some markedly 
skewed data, and suggested that, in view of this fact, 
the median might have certain advantages over the 
mean as a measure of the central tendency of the group. 
This suggestion has won little application. Probably a 
reason for this is the unwarranted belief on the 
part of so many psychologists that the mean is in every 
case superior to the median as a statistical tool except 
where ease of calculation is the prime desideratum. 
With maze data, as well as with quite a quantity of 
other psychological data, however, the median is really 
the statistically preferable tool on the grounds of 
smallness of standard error. Liggett in 1928 indicated 
the fundamental reason for the unsatisfactoriness of 
the mean relative to maze data : 

"When results like this arc secured, tl)c curve must 
always be skewed, for it is impossible to low scores 
that wiH counterbalance the high ones, due to the near- 
ness of the physiological limit at the lower end of the 
scale. A gicater number of cases will not change this, 
for one may get more high scores, but cannot get scores 

small enough to cancel the liigli ones 

‘'Quite possibly the maze is n reliable method of 
measuring learning, but on account of the peculiar distri- 
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biJtion of learning records, the present statistical methods 
of treating nfia^sc data do not give significant group dif- 
ferences. If reliable concluaions cnn be drawn from 
group averages m the maze, it will he after more knowl- 
edge is gained of distributioufi^ and of methods of treat- 
ing skewed curve data. 

‘'Granting that maze records da not distribute upon 
the normal curve, the arithmetic mcrtn is not an accurate 
measure of central tendency, . , ♦ (17, pp. 54-55). 

Again, in Yule we find : 

' "For a norma! curve the standard error of the mean is to 
the standard error of the median ns \00 to 125, and in 
general the standard errors of the tNvo stand in a some- 
what similar ratio for a distribution not differing largely 

from the normal farm The mean is hr gietiEral 

less affected than the median by errors of sampling. At 
the same time, we also indicatccl the exceptional cases in 
which the mcduin might be the more stable— eases in 
which the mean might, for example, be affected consider- 
ably by small groups of widely outlying observations, or 
fn which the frcqucncy-distrihution assumed u form re- 
sembling fig* 53 (i.c* — ^flnt-toppcd curves), but even more 
exaggerated as regards the height of the central 'peak* 
and the relative length of the 'tails/ Such distribmiona 
^ * might be expected to characterize some forms of 

experimental error* , . . * 

Further, in to/ue exfierirueutal cases if is conceivable 
that the median may be less affected by defmite ex per i- 
inenial errors, the average o/ ighich docs not tend to be 
%ercj than is the ftiean, — this is, of course, a point quite 
distinct from that of errors of sampling ** (38, pp, 344- 
345). (Italics mine,) 

The condition mentioned in the last sentence of this 
quotation would seem to be the condition with maze 
data^ as Liggett has pointed out, One cannot expect 
that experimental errors on such learning problems as 
this will cancel in their efFects, The solution of the 



RBUAJJIUTV AND VALIDITY OP MAZB 


159 


matter, however, is probably not as difRcult as the 
quotation from Liggett suggests- The median can be 
taken as a fairly satisfactory measure of central ten- 
dency even under these conditions, for it is reasonable to 
assume that the majority of the animals will not have 
been affected by serious experimental errors and the 
value of the median will be determined by this large 
majority of unaffected animals, 

B. Problems of Validity 

Only limited experimental data exist on the question 
of validity of different methods of measuring learning 
ability in animals, in spite of the fact that such in- 
formation is basic for the accurate interpretation of 
experimental results,® It may be true that the ability 
to eliminate errors or excess time on a maze is indicative 
of learning ability in general, but this cannot be 
assumed without proof. The very favorable maze per- 
formance of rats in comparison with the maze perfor- 
mance of human subjects, coupled with the very obvious 
differences between the two on other learning prob- 
lems, suggest that maze-learning ability is certainly 
not an index of all types of learning ability. Most ex- 
perimenters on animal behavior, however, instead of 
securing experimental evidence of the validity of their 
measures, have been content to assume that the learn- 
ing scores on a maze, or light-discrimination problem, 
or problem box, are indicative of “learning ability” in 

®A recent article by Tryon (33i), "Studies in individual difFcrcnccs 
in maze ability. III. Tbe community of function between two maze 
abilities," gives a good summary of the experimental material avail- 
able to date on this question. 
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general. To what extent this is true cannot be estimated 
with any assurance until one group of animals has been 
tested not only on one such particular problem, but 
also on a considerable variety of other learning prob- 
lems, and until the correlations have been determined 
between the scores on that particular problem and the 
scores on the other problems. Each of these correla- 
tions would be a validity coefficient of scores on either 
problem as a means of indicating the probable scores 
that the animals would make on the other problem, 
and the series of correlations bet;veen scores on one 
particular problem and scores on a number of other 
problems (varied enough to sample all the fields of 
learning) would be the means of cslimaling the validity 
of that particular problem as a means of estimating 
"learning ability” in general. 

It is obvious from what is said that a given measur- 
ing instrument may have any number of validities, be- 
cause, after all, a test is not simply valid in some 
abstract sense — tests have validities only relative to 
such and such other things. Thus, a given maze might 
be found to have one validity as an index of the ability 
of rats to learn such and such other varieties of mazes; 
still another validity as an index of ability to solve 
problem boxes; and still a third validity as an index of 
the ability of the animals to make fine visual discrimin- 
ations. 

The consideration of this concept of validity coef- 
ficients indicates clearly that only a vague and more or 
less arbitrary line separates "validity coefficients” from 
"reliability coefficients.” For, while reliability cocf- 
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ficients may be defined theoretically as the correlations 
between two measures which sample the same func- 
tions in the same proportions in each case, practically 
speaking, reliability coeflicients arc ordinarily calcu- 
lated from measures which only approximate this 
ideal, and they really sample slightly different ranges 
of functions. However, as validity coeflicients become 
more and more restricted in their scope of reference, 
they also approach this same status. So when, as in 
the present study, certain of the correlations are of 
scores on two different mazes, it is a more or less 
arbitrary choice as to whether one calls the coefficients 
validity coefficients or reliability coefficients. Strictly 
speaking, they are validity coeflicients, but, practically 
speaking, the two measures measure ranges of functions 
that are so nearly identical that they can actually be 
used as reliability coeflicients. 

C. PiiODLEMS OF Reliability Coefficients 

In evaluating the different possible methods of fig- 
uring reliability coefficients, wc will be aided if we 
keep in mind the concept of reliability as being the 
measure of the extent to which chance factors have 
been excluded from obscuring the measurement of 
some more or less fundamental characteristic of the 
members of a group. Measurements of reliability in 
this sense could be had directly, if it were possible to 
secure two samples of performance which would satisfy 
the requirement of being absolutely independent 
measures of the same thing. In studies of behavior, it 
is rare that reliability coefficients can be taken as alto- 
gether accurate indexes of the reliability of the mcasiir- 
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ing instruments involved, either because they are not 
independent measures or else because they are not en- 
tirely measures of the same thing. 

A fev/ illustrations have already been given of the 
ways in which reliability coefficieiils may be defective, 
but the entire matter may be summarized here now, 
In the first place, it is possible to cite cases where the 
reliability coefficients are higher than the reliability 
of the measuring instrument really justifies. For ex- 
ample, where the animals of a group arc not equally 
motivated, they tend to be consistently different be- 
cause of this difference in motivation, Moreover, from 
the standpoint of most maze experiments, such dif- 
ferences resulting from poor control of feeding are the 
result of essentially irrelevant factors, and the correla- 
tion of odd and even trials, for example, is not a corre- 
lation of two independent measures of the same thing, 
but of two measures between which certain systematic 
errors exist which raise the correlation. On the other 
hand, the lack of independence between the two 
measures may operate to reduce the correlations. In 
the present experiment, negative correlations were 
found with certain groups between errors on the first 
trial and errors on the remaining trials. The most 
reasonable interpretation of this is that the scores on 
the first trial were largely determined by chance, but 
the subsequent performance tended to be improved if a 
rat had, by chance, made an unusually large error score 
on the first run, In this case, the reliability coefficient 
will be lowered rather than raised as a result of the 
lack of independence of the measures. 
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It is highly probable that correlations of odd versus 
even trials or of such parts of learning as Trials l-lO 
versus 11-20 on the same maze have been influenced by 
such carry-over as above described. For this reason, 
the test-retest method of calculating reliability coef- 
ficients is of particular interest. Particularly where the 
animals are returned to normal feeding and where a 
considerable period of time is allowed to intervene be- 
tween test and retest, there is relatively little possibility 
that systematic experimental errors will last over, or 
that transfer effects will give rise to spurious correla- 
tions. This is particularly true if the retest is given 
on a maze with a different pattern. 

Where the test and retest are separated by too long 
an interval of time, the resulting reliability coefficients 
may be defective in a different way, namely, that while 
they are independent measures, the same thing does not 
exist to be measured in the two cases. In other words, 
with intervals as long as those used by Plcron (6) , the 
maze ability may change and the reliability of the maze 
as a measure of the learning ability existing in either 
period may really be higher than the test- retest coef- 
ficient would indicate. The conclusion would seem to 
be, then, that with maze studies it is perhaps impossible 
to get reliability coefficients which are not more or less 
biased in one direction or another. To estimate relia- 
bility, therefore, the procedure that seems necessary is 
to calculate reliability coefficients in a number of ways, 
using those methods of calculation which most approxi- 
mate the ideal of having independent measures of the 
same thing, and then estimating reliability from the 
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total collection of reliability coefficients, trying to make 
due allowance for any systematic errors which may 
have affected the correlations. 

Let us turn, then, to a critical evaluation of the dif- 
ferent methods of calculating reliability cocfTicients 
which have been suggested during the history of the 
problem: Stone and Nyswander have used the fol- 
lowing methods, (a) correlation of scores on odd 
versus even blinds ; (/>) correlation of scores on the 
first versus the second half of the maze; (c) correlation 
of scores on odd versus even trials, and (r/) correlatioji 
of scores on different groups of trials. All of these 
methods may be characterized as measures of the in- 
ternal consistency of maze scores. Methods c and d 
have been used in the present experiment, but Methods 
a and b are rejected because these correlations arc 
probably too seriously affected by systematic errors. An 
error made by chance on one blind, for instance, may 
tend tQ confuse the rat for some blinds following. There 
is not so great a probability, however, that scores on 
successive days or successive groups of trials would be 
quite as lacking in independence. 

The third method used in calculating reliability 
coefficients in the present experiment is the test-retest 
method. In addition to the merits suggested above, in 
the previous discussion of this method, an additional 
merit comes from the fact that one is interested in 
knowing the reliability of the entire series of maze 
trials. With the test-retest correlation, one can take the 
raw correlation coefficient as indicative of the relia- 
bility coefficient of either test separately when the 
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assumption seems warranted that the test and retest 
have both had about the same reliability. On the other 
hand, if one is trying to calculate the reliability coef- 
ficient of an entire series of trials from the various 
internal-consistency correlations, one must use the very 
dubious procedure of calculating the reliability coef- 
ficient of the series with the use of the Brown-Spear- 
man formula. It has been empirically demonstrated, 
of course, that this formula predicts accurately the ef- 
fects of lengtliening certain types of educational tests 
within certain limits of increase of length. However, 
this formula is peculiarly dependent on the assumption 
that the added material is of equal difficulty, and in- 
dependent of the material to which it is added. And 
as has been indicated before, it seems certain that the 
various parts of one training series do not have this 
independence. 

Still another method has been suggested and used by 
Lashley (1929). Lashley's procedure and the merits 
he claims for it, are indicated in the following; 

"To avoid overlapping of the data and consequent 
spurious correlation, the average time per trial and the 
average errors per trial for the first 10 trials of learning 
were correlated with the total number of trials required 
to reach the citcrion of learning. The measures correlated 
are thus mutually exclusive” (15, pp. 20-21). 

Whether these merits actually hold for this method, 
however, depends on whether any appreciable number 
of animals meet the norm of mastery before the 10 
trials are completed. If such is the case, there actually 
may be some element of spurious correlation due to the 
fact that errors in the latter part of the learning curves 
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of Such animals will tend to raise scores on both types 
of measurements. This defect, however, can be avoided 
by correlating errors from such a limited group of 
trials that only a few animals, at the most, will have 
met the norm of learning in that period, Even with 
this precaution, however, one fact that still remains is 
that the correlated measures are from the same training 
series and are therefore presumably subject, to some 
extent, to the same defects as the other measures of the 
internal consistency of a maze experiment. Still an- 
other objection to this method of figuring reliability, 
however, is that it depends upon the correlation of 
measurements of two different types, and that unrelia- 
bility of either type of measurement will make the re- 
liability coefficients from the combination rather lower 
than the reliability of the better of the two. 

Still another method of correlating scores, to deter- 
mine the extent to which chance factors were influenc- 
ing maze scores, used by Hunter (7) and Heron (4), 
is the procedure of correlating Vincent scores rather 
than raw scores. Just how the resulting reliability coef- 
ficients are to be interpreted, neither Hunter nor Heron 
has indicated, and the fact that Vincent curves are of 
the highest value for certain other problems (namely, 
where one wishes a picture of the form of the learning 
curve for a group of subjects) does not insure that they 
are useful here. At least one serious objection to 
, their use can be pointed out. This objection is that con- 
verting raw scores into Vincent scores spuriously raises 
the correlation by making the scores in any one tenth 
dependent on the total number of trials required by 
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that subject to attain the norm of mastery. Any chance 
factor which increased the number of trials required by 
the rat to reach the norm of mastery would affect the 
scores through all the tenths of learning. Hence, corre- 
lations of Vincent scores would be affected by some 
unknown and more or less unmeasurable element mak- 
ing for spuriousness in correlation. On the other hand, 
with correlations of odd versus even trials, while it is 
true that chance errors may affect the two series of 
scores correlated, in those cases where the effect of the 
chance factor endures for more than one trial, there 
is nothing in the mathematical treatment of the data 
which would tend to make for spurious correlations, 
as is the case with Vincent scores. 

In addition to the precautions suggested above, a 
most important consideration in the comparison of 
reliability coefficients derived with different groups is 
the consideration of the influence of different ranges 
of talent in the different groups. This is important be- 
cause of the fact that, with the same maze pattern and 
maze procedure, the size of the reliability coefficients 
secured will be to a major extent a function of the range 
of true ability in the groups, The more heterogeneous 
the group, all other things being equal, the higher will 
be the reliability coefficients. 

As Kelley and Shen state: 

“As the xanKe pf a distribution increases, ..... errors of 
estimate remain relatively constant, and consequently r 
will’ increase with the standard deviation. Assuming, 
tlicn, that errors of estimate arc equal for different ranges, 
we have the following relation between the dispersion of 
the distributions and' the magnitude of the correlation 
coefficient; 
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wlicrc r IS tlic correlation coeflicient Jcrivctl /rom distri- 
bution with ft stnnclurd deviation of tr, and R from 
distributions with a standard deviation of 2. Since the 
range of data is more or fess arbitrary and accidentaV in 
any given study, it should alwa.vs be taken into account 
in interpreting the correlation coefficient derived" (21, 
P«a40). 

The clearest illustration of this tendency that is 
available from maze data is found in the comparison 
by Lashley (IS, pp. 20-22) of the reliability cocITicients 
of his maze when figured, first, from the records of 
normal animals, and then from the records of a group 
with brain lesions. The reliability coefTicients found 
by means of correlating total trials to learn with time 
and errors in the first 10 trials are presented in Table 1. 
In correlating scores for time, errors, and trials on two 
mazes, Lashley found the correlations for the two 
groups as shown in Table 2. 

Of course, it. must be admitted that between his 
normal group and his operated group there was a dif- 

TABLE I 

RnLiADiUTY Coefficients from Lashcey’s Experiments as 
Figured by Correlation of Tiuals to Learn with Time 
AND Errors, Respectively, in the First Ten Trials 


Scores correiftietl 

N in group 

ReJiAbility coc/bcienid 

Time Rrrors 

Leartilng scorcB of 
normalfl 

59 

,09±.09 

M±,09 

Learning ficorcs of 
operated rots 

37 


.ssit.oa 

Retention scoreii of 
operated rata 

S9 

,76±M 

.79±M 
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TABLE 2 

Correlations Found by Lashlby between Scores om Two 
Different Mazes with Normal and with 
Operated Rats 


Scores correlated 

Nornml 

Operated group 

Time 

— .09±.I8 

,55±.09 

Errors 

—. i 6 ±. l 6 

.67 zt . 07 

Trials 


. 6 i ±.07 


ference in range of ability such as might never be found 
between two groups of normal animals, nevertheless 
the probability of differences in range of ability in dif- 
ferent groups is sufficiently serious so that considera- 
tion of this factor is important. In Tryon’s (30), 
experiment, for instance, because of the fact that Tryon 
was primarily interested in studying the inheritance of 
maze-learning ability, the rats were selected so as to 
yield as heterogeneous groups as possible. It is reason- 
able to expect, therefore, that the reliability coefficients 
which he secured are higher than would be secured by 
another experimenter using the same maze and exactly 
the same procedure, but groups of rats as homogeneous 
as those used in Stone and Nyswander’s experiment or 
in the present experiment. 

Just how much higher the reliability coefficients are 
as a result of unusual heterogeneity of subjects in maze 
experiments cannot be estimated, however. The pro- 
cedure suggested by Kelley and Shen for estimating 
the influence of range — the procedure of using the 
formula : 
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— is not applicable to maze data because of the fact 
(as the present experiment has demonstrated) that the 
assumption mentioned in the above quotation (6,000) 
is not satisfied with maze data. Errors of estimate do 
noi remain relatively constant under the different con* 
ditions under which different experiments arc run. The 
errors of estimate, as will be shown later, vary with the 
same group in different portions of learning, and differ 
bet\veen groups for the same set of trials whenever any 
such factors as differences in feeding exist between the 
two groups. With test scores it is mainly true that only 
differences in range of ability sampled and errors of 
measurement will cause differences in the sigmas of 
dilferent distributions. With maze scores, however, 
the sigma of scores for a given portion of learning will 
be determined not only by the degree of heterogeneity 
of the group and by the unreliability of the measuring 
instrument, but also by whatever factors tend to pro- 
duce a high or low mean score for the group for that 
portion of learning. Hence, if we applied the formula 
suggested by Kelley and Shen, the groups with the 
lowest learning curves would be indicated as having 
the greatest i;eliability simply because of the fact that 
(due to stronger motivation or better preliminary prep- 
aration, perhaps) their scores are more closely 
grouped and the subjects appear to be a very homogen- 
eous group. To take an extreme case, if learning has 
been carried on long, most of the group may be making 
zero scores, and the distribution of scores will be ex- 
tremely small ; but this is obviously not dependent upon 
great homogeneity of the group. 
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The difficulty, then, is to get the variability coming 
from such sources as motivation separated from the 
variability coming from differences in ability and from 
variability due to unreliability of measurement. Some 
sort of direct measure is needed of the amount of varia- 
bility due to differences in ability, for the portion of the 
variability due to such differences in range of ability 
should be taken out; but the variability determined by 
other factors should be left in as part of the experi- 
mental data. 

There is a formula suggested by Garrett (3, pp. 276- 
277) 

cr "v/l — r 

which might be used for comparing the reliabilities 
of different maze groups, if it were true that whatever 
factor^ affect the sigma of scares of a group would 
affect the mean of the scores in the same proportion. 
(It is to be noted that this formula of Garrett’s is 
merely a cross between the formula for the standard 
error of a score : 

(^acoro cr ‘\/ 1 K 

and the coefficient of variation 



Wherever this condition holds, this formula makes pos- 
sible the comparison of the reliabilities from different 
groups in the same manner as the ordinary formula for 
the standard error of a score permits the comparison of 
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reliabilities from different groups when the errors of 
estimate with the different groups are known to be 
equal. However, with maze data there does not exist this 
dose and direct relation between the sigma of score for 
for any portion of learning and the mean of the scores 
for that same portion — the factors that affect the mean 
affect the sigma in a different degree in different cases, 
Thus, when the cocfRcieiits of variation for the group of 
107 rats of Tryon's experiment are calculated, the coeffi' 
dents for the successive groups of 3 trials on the first 
maze arc 27, 63, 78, 90, 95, and 105; and on the second 
maze, 40, 57, 80, 77, 91, and 88. Inasmuch as the range 
of true ability was that of tlie same group throughout, 
and inasmuch as the reliabilities of these different peri* 
ods was not as different, by any means, as these figures 
are, one is forced to conclude that the factors that affect 
the nacans with maze groups do not affect the sigmas 
to the same degree, and that, consequently, even this for- 
mula suggested by Garrett can hardly be used for the 
comparison of the reliabilities found with maze groups 
having different ranges of ability. It may be that, 
where differences in range are extreme, comparisons 
may be made slightly better through the use of this 
formula, but, with the differences in range more com- 
monly existing, the use of this formula would prob- 
ably lead to greater errors than it would cure. 

Hence, the only way to safeguard against the possi- 
bility that reiiability coelRcients from different maze 
groups have been made incomparable by differences in 
ranges of ability is to use groups that are known to have 
approximately the same degree of homogeneity. Using 
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Stock that is highly inbred as the source of all the 
groups gives an approximation to this ideal; but the 
only really satisfactory procedure seems to be the split- 
litter technique, the importance of which has been ex- 
perimentally demonstrated in so beautiful a fashion by 
Corey (2) In 1930, The benefits made possible by the 
use of this technique, as Corey points out, are not only 
the rough equating of groups with respect to heredity, 
but also the equating of the groups with respect to 
seasonal variations in temperature and humidity and all 
the other changes which more or less inevitably slip 
in, despite the experimenter’s attempt to keep condi- 
tions absolutely constant. Strong corroboration of 
Corey’s points is to be found in the data of the present 
experiment. So important is this consideration that I 
would venture to say, with some assurance, that there 
Is no means today whereby an experimenter can claim 
that he has with reasonable certainty found that a cer- 
tain maze pattern or maze procedure yields greater re- 
liability. than some other pattern or method used by 
others except by using the procedure of the split-litter 
technique to compare the reliabilities of patterns or 
methods in question. This, of course, would not hold 
where experimenters had used approximately similar 
conditions and stock and yet had found extreme dif- 
ferences in the reliabilities, but would hold quite defi- 
nitely, I believe, when one seeks to estimate whether 
the procedures of Tryon, tlusband, Liggett, Stone and 
Nys wander, Yoshioka, or the present experiment yield 
the more reliable results. 



IV 

EXPERIMENTAL RESULTS 
A. iNTRODUCl'ION 

The first object of this experiment was to determine 
whether, with the multiple-T maze, the highest relia- 
bilities could be secured with maximum, moderate, or 
minimum prevention of retracings. The second ob- 
ject was to determine whether a program of strong re- 
striction or a program of but moderate restriction of 
feeding would yield the higher reliability. The third 
object of the experiment was to throw additional light 
on the problem of the interpretation of reliability co- 
efficients from maze experiments. To this end various 
treatments have been worked out of the scores of dif- 
ferent groups on a single maze, and also three of the 
groups have been given retests after a fairly long rest 
interval following their original training, and the test- 
retest correlations calcuiated. 

The maze pattern used was essentially that of Stone 
and Nyswander (27) except that, due to the fact that 
their specifications of dimensions were not clear, the 
present maze was constructed with each unit longer 
than theirs. (The correct dimensions of their maze 
have been secured from Stone directly.) One trial was 
given each day, and Stone and Nyswander’s methods 
were duplicated as closely as possible, with the major 
exception that the preliminary training was given on 
a short straightaway rather than on the simple one- 
tread problem box which they used, and for a lO-day 
period rather than their S-day period. 

[1741 
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B, Apparatus and Metitod 

The apparatus and procedure of the present experi- 
ment will be described in some detail because our 
knowledge is still so scanty regarding the factors in- 
fluencing reliability that when we have found a certain 
reliability from one particular experiment we cannot 
say with certainty which factors have been responsible, 

1. The Siraightaioay Used for Preliminary Train- 
ing, For preliminary training, a small straightaway 
was used. The starting box and the food box of this 
were similar to those of the regular maze, and the alley 
was the same in construction except that there were no 
turns or blinds. For use in the first three days of the 
preliminary training with certain of the groups, the 
length of this straightaway was 26" from the exit of 
the starting box to the entrance of the food box. For 
the remaining trials, with these same groups, and for 
all of the trials with all the other groups, the straight- 
away had a length of 6'. 

2. Construction of the Maze, The pattern of Maze 
I is shown in Figure 1. Maze II was a mirror image 
of Maze I. It was constructed by turning Maze I up- 
side down and attaching the wire covering to the new 
top of the maze. The maze was constructed of 
white pine boards, and was painted a flat black on the 
inside. The alleys were 4-" wide and 4" deep, inside 
dimensions. The ceiling of the maze was made of 
hardware cloth nailed to the top of the walls of the 
maze. The maze had no floor of its own, but rested 
on the floor of the room. This floor was covered with 
varnished brown battleship linoleum which was cleaned 
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after each day’s work by mopping it up with a wet 
cloth. To facilitate this cleaning, the maze was at- 
tached to ropes in such a way that it could be lifted as 
a unit to a height of about 3' from the floor. 

Retracing doors were hung at the poinls indicated in 
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Tlic (loor$ used to prevent retracing arc indicntcil by solid lines across 
the path, the pscudo-rctroclng doors arc indicated by broken lines 
across the path. The doors used with the 4-(loor groups arc indi- 
cated by X marks. With the I-doot groups only the door in tlic 
last alley was used. 
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Figure 1 by solid lines across the maze path. These 
doors were constructed of hardware cloth stretched on 
wire frames and were so hung from the top of the maze 



Diagram of tub Elryated Maze 
Width of path, ljJ4" 
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that they could be lowered to prevent retracing. These 
doors were operated from outside the room. At the 
other ends of the alleys in each unit dummy doors were 
hung so that the rats could not learn to make their turns 
on the basis of visual cues derived from the doors, TJie 
rats were released from the starting bo,x by a door 
sliding in guillotine fashion. To avoid all possible 
noise during the run, this door was left open until the 
rats had reached the food box. 

The third maze used was an elevated maze (Figure 
2), The width of the path was 1^4" and the height 
of the path from the floor was 21". No starting box 
was used with this maze. The rats were merely placed 
by’hand at the starting point. This elevated maze was 
not located in the sound-proof room (to be described 
below), but as it was placed in a relatively secluded 
room of the laboratory, and, as all of the trials on it 
were given in the evening, there was relatively little 
noise attendant on trials with it. In running the rats 
on this maze, the experimenter stood at a small, high- 
topped table about 4' from the maze, No screen was 
used to conceal the experimenter from view. Each 
rat was carried into the room by hand, a distance of 
about 25', and was returned to its cage before the next 
rat was secured. 

3 , The Room Used io Control Sound Uurlng Runs, 
All of the trials on the straightaway in the preliminary 
training and all of the trials on Mazes I and II were 
given in a room which was approximately sound-proof. 
The main details of construction of this room are shown 
by the diagram, in Figure 3, of a corner of this room. 
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For each trial, the rat was carried into this room and 
left there alone until the completion of the run. Errors 
were recorded by the experimenter from outside the 
room. 

The walls of this room were constructed of Celotex, 





FIGURE 3 

Detail of the Construction ov the Room for Controlling 

Sound 
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Yi in thickness, and of l"-thlck builders’ felt. The 
total structure consisted of two rooms built entirely 
separate from one another except that the inner one 
rested on the floor of the outer. The inside dimensions 
of the inner room were 6' high by 9’9" x 9'9". The 
floors of these rooms were of matched tongued-and- 
grooved flooring and were separated by an air space 
and layers of felt. The entire structure rested in 
troughs containing felt strips and sand to minimize the 
vibrations coming from the floor. All the essentials of 
construction are shown in Figure 3. 

Particular care was taken in the construction of the 
doors, of which there were two, one for each of the two 
rooms. To make possible the observation of the ani- 
mals from without the box, each door was constructed 
with a glass window. The bottom of the inner window 
was 41" from the floor of the box. The window in each 
door was double and was constructed of two panes of 
glass carefully placed in Celotcx or paper felt, and 
with a dead air space between the two panes. The re- 
tracing (loors and starting-box door of the maze were 
operated by wires running through tiny holes in the 
walls of the rooms. Within the inner room ordinary 
noises from outside, as from the cages of the rats, could 
not be heard, and loud sounds could be heard only in 
a rather muffled way and with their direction confused. 

4. Animals, The groups of this experiment were 
relatively homogeneous. Each group was composed of 
as few litters as were necessary to give the desired num- 
ber of rats, and in addition to this the dilTerent litters 
were fairly closely related. Most of the animals were 
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from the Clark Psychology Laboratory colony, which 
has been developed from Wistar stock and which has 
been rather generally inbred. The rats of Group C 
were secured directly from the Wistar Institute. 

Table 3 gives the main numerical data on the 
animals used. A word is in order as to the group sym- 
bols used. The group designation is a capital letter 
{At B, etc.), but the group symbol that will be gen- 
erally used will indicate also the maze on which trained 
(I, II, or A/}, the number of retracing doors used ( 1, 
4, or 13), and whether fed liberally or scantily {Lb. or 
Sc,). Thus Group I. 13. Sc.C, refers to Group G as run 
on Maze I, with 13 retracing doors used, and with a 
program of scanty feeding. Two of the groups are in- 
dicated by B and B\ respectively, because these two 
groups were composed of the same litter split as evenly 
as possible between the two groups in order to make 

TABLE 3 
Animals Used 

Animal gioups with their ages, numbers of individuals, numbers by 
sex, number of trials, and mazes on wliich used. 






Age in days 

No. days 






:it start of 

between 


Group design fi- 

Wo. No, Total 

training on 

groups of 

No, trials 

ll DJI 


¥ 

No- 

straightaway' 

trials 

on mnze 


23 

U 

+1 

SS to 115 


30 

lA,Sc,Ti 

15 

16 

31 

52 to 102 


30 

\XSc,B' 

la 

13 

31 

52 to 102 


30 

ILl.Sc.D } 
I.l.Sr.O 5 

12 

20 

32 

1 110 to 116 ) 

j 180 to 186 ( 

40 

{ 30 

1 10 

l.U.Sc.C ) 
ILU.Sc.G ) 

13 

21 

34 

J so to 51 i 
( 120 to 121 ( 

40 

( 30 

< 20 

W.G 

11 

20 

31 

150 to 160 

0 

till 

learned 


13 

20 

33 

< 111 to 1J7 ) 

? 166 to 194 1 

43 10 +5 

i 1 
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them as comparable as possible in distribution of abil- 
ity. These two groups were also run at parallel times, 
half of each Utter being started on the 4'door maze and 
the other half on the 1-door maze at the same time. 
Thus these two groups were adequately equated not 
only with respect to heredity and heterogeneity, but also 
with respect to experimental conditions. In view of 
the closeness of learning curves and reliability coefR- 
cients from these two groups, in contrast with the rela- 
tionship of other pairs of groups, the following experi- 
mental work would be much more conclusive if the 
split-litter technique had been used throughout. 

With Group I.l3.iSc.JS, the interval of 43 to 45 
days separates a first and a second group of trials on 
the same maze. The interval consisted of 38 to 40 days 
of rest, followed by 5 days of training on the straight- 
away (one trial per day, with the full-length straight- 
away) before the retest. The procedure was different 
with Group C and Group D in that with these groups 
the retest was given on a different maze from that used 
in the original learning. With Group G the first 
group of trials was given on Maze I and the second 
group of trials on Maze II; with Group D the pro- 
cedure was the reverse of this. With these two groups 
the interval of 40 days between the learning of the 
first maze and the learning of the second maze was 
composed of 30 days of rest and 10 days of training on 
the straightaway. 

For purposes of determining the effect on reliability 
of varying the amount of retracing permitted, the most 
important comparisons are of Groups lA.Sc.B, I.l. 
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Sc.B'j ll.l,Sc.D, IJ3, Sc.C, and I.13./St.S. As regards 
the relation of feeding to reliability, the comparison is 
primarily between Group lA.Lb.A and Group 1.4. 
Sc.B. The tcst-retest correlations involve Groups 1,13. 
Sc.G. and Groups ILl,il?c.D and I.l.5c.Dj 

and the first and second groups of trials of Group 1.13. 
Sc.E. 

5. Procedure, During the experiment the rats were 
fed only after their maze run for the day, the food being 
a dry powdered McCollum diet. The ingredients 
(proportion by weights) were; 


Whole wheat flour 

290 

Whole milk powder 

75 

Casein 

40 

Salt^ 

6 

Calcium carbonate 

4 


415 

No other food was given throughout the entire period 
of experimentation except with a few of the very 
youngest rats which showed inability to keep their 
weight and strength up sufficiently when shifted to this 
dry food. Water was before the rats in the living 
cages all the time; and, when the rats were fed after 
their daily experimental work, water was available in 
the food compartments. Prior to the beginning of an 
experiment, dry food was kept before the animals con- 
stantly, and, in addition, their diet was supplemented 
with bread, milk, and green vegetables. 

During the experimental period from 2 to 9 rats 
were kept in each cage, with the sexes always segre- 
gated. Cleaning of cages was done only after the ex- 
perimenting for the day, and an effort was made to 
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avoid ail oth&r conditions which might disturb the rats 
during the experimental period. 

Care was exercised in controlling the feeding because 
of the fact that, otherwise, consistent difTerences in per- 
formance might have resulted from differences in hun- 
ger. Before the experiment with any group, the rats 
were weighed once a day for two or three days, to de- 
termine the normal weights at that time. A weight 
schedule was then calculated for each rat on the basis 
of this normal weight. For example, with some groups 
the object was to bring the rats to 80% of their normal 
weight at the end of the first 10 days, and to other per- 
centages at other portions of the learning period. (For 
the details of this, see Figures 12 and 13 and the ac- 
companying text.) The rats were weighed every day 
of the experiment and their Aveight fluctuations com- 
pared with those scheduled for them. The weights 
were controlled by adjusting, for each rat separately, 
the length of the feeding time in accordance with what 
seemed necessary to adjust the weights to the schedules. 
The average feeding time for the scantily fed groups 
was perhaps about twenty minutes a day. Further data 
on the weight fluctuation of different groups will be 
presented later in the section of the results that deals 
with the relation of weight to learning scores. 

With every group, 30 preliminary trials Avere given 
on the straightaway before training was begun on the 
first maze learned (3 trials per day were given for 10 
days). Following the tenth day on the straightaway, 
one trial a day on Maze I or 11 was given for periods 
of from 7 to 30 days with different groups. With 
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Groups G and D retesting was given after 30 days of 
rest and 10 days of a second preliminary training. Dur- 
ing the 30 days’ rest the animals were fed liberally, the 
diet consisting not only of dry food constantly before 
them, but also of daily feedings of bread, milk, and 
green stuff. With the preliminary training for the sec- 
ond maze the same general type of control of feeding 
was instituted as with the first maze, proper allowance 
being made for the differences in feeding necessitated 
by the greater age on the second training. With these 
retested groups, the second preliminary training con- 
sisted of merely one trial per day for ten days on the 
6' straightaway. One trial a day was also given with 
the second maze learned. 

On Mazes I and H all of the rats were run in the 
afternoon between 1 P,M. and 5 P.M. An effort was 
made to run the same litters at exactly the same time 
every day and, in general, there were no greater fluc- 
tuations from schedule than about twenty minutes. 

The procedure used with the elevated maze was radi- 
cally different. Only rats of Group G were tested with 
this maze, and the training was given on whatever 
days the rats of this group finished their training on 
Maze II. On this elevated maze, training was massed. 
Trials were given with only 45 seconds intervening be- 
tween trials until each rat had met the norm of three 
errorless runs in succession, or four errorless runs in 
live successive trials. Only a nibble of dry food was 
allowed at the end of each run. The preliminary train- 
ing for these trials on the elevated maze consisted of 
two trials given on a simple elevated straightaway of the 
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same construction as the regular elevated maze, but 
with a path 28" long. 

Two types of errors have been recorded: (1) Jor- 
ward errors when a rat entered a blind alley for a dis- 
tance of two-thirds or more of its body length when 
coming to a point of choice from a previous segment 
of the true path, and (2) relracing errors when a rat 
retraced any segment of the true path, or retraced into 
a blind. The forward errors rather than retracing er- 
rors have been used in the calculations of learning 
curves, distribution of scores, reliability coefficients, 
etc., except in the few cases where it is expressly stated 
that retracing errors have been used. This is the 
procedure used by Stone and Nyswander (25) and by 
Heron (6). 

The above usage was adopted in the effort to dupli- 
cate the procedure of Stone and Nyswander. Recent 
personal inquiry, however, reveals that the original 
statement of their definition of types of error was not 
correctly interpreted. The differences may be illus- 
trated by Figure 4, According to the usage of the 
present experiment, one forward error, ( 1) , would have 
been recorded, and six retracing errors, (2), (3), (4), 
(5), (6), and (7). According to the usage of Stone 
and Nyswander, two forward errors would have been 
recorded, (1) and (2), and but two retracing errors — 
(3), (4), and (5) as the first, and (6) and (7) as the 
second, Heron’s procedure, which also was designed 
to duplicate Stone and Nyswander’s, follows their pro- 
cedure except that (2) would have been counted as a 
retracing error (letter from FIcron). 
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Diagram of the Different Types of Error (See Text) 

These differences in definition in these three studies 
are regrettable, but are not of much practical conse- 
quence. The difference in definition of forward error 
concerns a type of error made only rarely; and, as re- 
gards the retracing errors, these have entered into but 
few of the important calculations. 

C, Results 

The results of this experiment which are of primary 
interest are the data on reliability. However, it must 
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be remembered that reliability coelTicienta arc a func- 
tion> not merely of the apparatus used, but also of the 
procedure and of the groups tested. Therefore, rather 
complete data are presented on the learning curves on 
the straightaway and maze, on the weight curves, and 


Vo 
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Day * 

FIGURE 5 

Mban Time of the Runs on the Straightaway of Those 
Groups with Which the Six-Foot Straightaway Was Used 
FOR All Trials 

With Groups lH3.Sc.C, 1.13, nnd I.l.Sc.D, which had been 
tested before, only one trinl a day was given. With Groups I.4.Li'./Y, 
I.4.Sc.Bi and Ll.Sc.fl', tlircc trials u day were given, and the value 
for any particular day represents the mean time of the three runs 
of that day. 
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on the variability of scores in different periods of learn- 
ing. 

L Performance on the Straightaway During the 
Preliminary Training. As indicated on the graphs, 
certain of the rats had all of their trials on the full- 
length straightaway, and the remainder were trained 
for the first three days (9 trials) on a 26" straightaway. 
It was found that an economy of time was effected by 
the use of the abbreviated straightaway for these early 
trials without apparently sacrificing in any way the 



<j triqh ^ 

FIGURE 6 

Mean Tjmh or run Runs on the Sthaightaway or Those 
Groups wth Wiiicw the 26-Inch Straighta^vay Was Used 
FOR THE First Three Days and the 6-Foot Straightaway 
FOR THE Remaining Seven Days of the Preliminary 
Training 

'riit value for any pjirticuljir clay represents the inciin time of tlic 
three runs of that day. 
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value of the preliminary training. It, of course, may be 
that either program of preliminary training may have 
some slight advantage over the other, but if any conclu- 
sions may be drawn from the mean time curves for 
the later trials on the straightaway (see Figures 5 and 
6), it would seem that the difference is very slight in- 
deed, for, from the fifth day on, the mean time curves 
for the different groups are hardly distinguishable. 

2. Learning Curves of the Different Groups, The 
main reason why learning curves of the present study 
are of interest is the fact that they are markedly lower 
than any of the learning curves of the groups used by 
Stone and Kyswander in their study of reliability. This 
is shown by Figure 7 on which the mean number of 
errors per trial is graphed for the different groups. To 
facilitate comparison with the results of Stone and Nys- 
wander, two lines have been added to this figure to 
represent, first, the mean errors of their eight groups 
and, second, the minimum values achieved by any of 
their groups.* 

It is to be noted that only one of the eight learning 
curves of the present study has as high values as the 
lowest of Stone’s groups after about the first 3 trials. 
All the remaining groups of the present study have 
reached, on an average in 10 trials, a point lower than 

■‘Stone and NyswMiidcr in their article on the rcli.nbility of the 
multiple-T maze give no table of the mciin errors .ind, consequently, 
it has been necessary to draw these gf.aphs from a visual inspection 
of the graph of the learning curves of their groups. The upper line 
represents the mean for all eight of their groups as estimated by the 
apparent mean values for 'I'rials 1, 5, 10, 15, 25, and 30, 'Z'hc lower 
line indicates the lowest values that aciy of their groups had on these 
same trials. 
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Stone’s groups reached on an average in 30 trials. 
Moreover, whereas the learning curves of Stone and 
Nyswancler fall rather steadily throughout the entire 
30 trials, with most groups of the present study, there 
is practically no improvement after the first 10 trials, 
It is probable that this difference will be reflected to 
some extent in the reliability coefficients, because the 
situation in the present experiment would seem to cor- 
respond to the situation where an educational test is 
applied to a group for which (with the exception of a 
few very difficult items) the material is too simple. 
The accumulation of scores at zero tends to give lower 
reliability coefficients for such a group, other things 
being equal, than would be secured with the same test 
with a group having a range of ability such as would 
tend to give normal distribution of scores. 

There are a number of differences bchvecn the 
present experiment and Stone and Nyswaiider's which 
may be responsible for the differences in rapidity of 
learning. These differences arc; (a) use of tlie room to 
control sound, (A) preliminary training of 10 days 
rather than of 5, (c) use of the straightaway in pre- 
liminary work rather than the problem escape box used 
by Stone and Nyswander, {d) the 4" greater length of 
units in the present maze, (e) possible dilTerenccs in 
strength in motivation, and (/) differences in the ani- 
mals. 

It would not seem probable that differences in mo- 
tivation arc chiefly responsible for the difference in 
learning unless Stone and Nyswander are mistaken in 
their conclusion that they had found and used the feed- 



REUADILITY AND VALIDITY OP MAZE 


193 


ing schedule which provided approximately optimum 
conditions for rapid learning. 

It is to be noticed that the learning curves of the 
present study, in spite of their essential similarity to 
most of the learning curves, tend to fall more or less 
into several distinct groups. Thus, the i3-door groups 
(I.U.tS'c.C, and I.13.jS'c.25) have somewhat 

the highest learning curves (except for Group lA.Lb.A 
in the latter portions of its course) , The l-door groups 
(II.l.iS'c.D, I.l.jyc.JD, and have, in general, 

the lowest learning curves. Group I.4,5c. B and Group 
I.I.jS'c.JS', which, it will be remembered, were formed 
of split litters and run on the same days, have virtually 
the same learning curves. Group lA-Lb.A, after the 
sixth trial, has a learning curve much higher than any 
other group. It will be shown later that the explana- 
tion for this is the different feeding program used with 
this group, and, hence, considering merely the groups 
with roughly equal motivation, it can be seen that the 
graphs of Figure 7 indicate an inverse relationship 
between the number of retracing doors used and the 
rapidity with which forward-going errors arc elimi- 
nated. 

The mean retracing errors for the various groups 
for the early trials are shown in Figure 8. It is to be 
noted that the curves drop much more sharply than do 
the curves for forward errors (sec Figure 7) . Most of 
the retracing errors were made on the first trial alone, 
and after the fifth or sixth trial the only group that 
made an appreciable number of retracing errors was 
Group \A,Lb.A. From the fact that retracing errors 
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FIGURE 8 

Mran Number oe Rhtracino Errors pcr Trial for tuk Dif- 
FBRRNT Groups m xiin Early Period or T'rainino 
In thci trisvls subsequent to the seventh, all of the groiq)S retain about 
the same level as reachctl at the seventh trial, except for Group 1.13. 
iSr-G, as shown in tlic figure (with this group a period of 43 to 45 
days separates Trials 7 and 8), and Group the curve of 

which remained at about ,5 retracing error per trial throughout — 
about double or triple the level oE any other group. 
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are concentrated so largely in this first trial, and from 
the fact that this first trial is so largely a matter of 
chance, it justly seems that maze measures will be im- 
proved either by disregarding retracing errors alto- 
gether or else by disregarding the first trial, as Tolman, 
particularly, has recommended. That either of these 
courses, or even the two of them together, would be 
only a partial remedy, however, is indicated by the fact, 
as pointed out in connection with Figure 7, that re- 
tracing errors on the first trial seem to be significantly 
related to the forward errors made even after the first 
trial. That is, there seems to exist a direct relationship 
between the number of retracing errors made on the 
first trial and the rapidity with which forward errors 
are eliminated on the subsequent trials. This furnishes 
a strong argument in favor of the maximum prevention 
of retracing. 

The same general relationships are brought out by 
the data on the number of trials required by the dif- 
ferent groups to satisfy several norms of learning. The 
median number of trials of the different groups are 
shown in Table 4. The severe norm of learning used 
was the standard of three successive errorless runs, or 
four errorless runs in five successive runs (with “error- 
lessness” defined as freedom from either forward 
or retracing errors). The other, or moderate norm, 
required three successive runs with a maximum 
of but one forward error (retracing errors disre- 
garded) . The scores are in terms of number of trials 
preceding these errorless runs. Group lAZ.Sc,E is 
not included in this table because of the fact that in the 
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TABLE 4 

MBDtAN Number of Trials Required bv Dipperbnt Groups 
Bbporb Mebtino the Norms op Lharning 


Group 

Mcdinn number df 
iriJi)? preccclin^s 
the moderate norm 

Mcjliftn nomber of 
trialit preceding 
llic severe norm 

1.13.Sf.C 

n 

17 

n.U.^r.C 

9 

to 

\A.LbJ 

21 

11 

\AScM 

H 

10 


1 

8 

ILl.Sc.D 

(, 

8 

U.Sc.0 

4 

5 

F.LC. 

7 



13 runs given this group only 39% of the group satisfied 
the moderate norm and only 24% the severe norm. 
With Group lA.Lb.A only 49% of the group satisfied 
the severe norm in the 30 trials scheduled for the group, 
but additional trials were given to determine the 
median. 

The learning curves in terms of time® are shown in 
Figure 9. Attention is called to the following points. 
First, even with the groups with which retracing was 
permitted the greatest mean time for the first trial was 
less than four minutes. This was the case without the 
discarding of a single animal, and it is a strong recom- 
mendation for the type of preliminary training used in 
the present experiment. Secondly, there is a close simi- 
larity between the split-litter groups, lA.Sc.B and 
I.l.iSc.B’. Closer correspondence exists between the 

^Tnken from point of leaving tiic spot marked S on the tliauram of 
tlic maze in Figure 1 to the instant of passing the door of the food 
box, with no omission of time for stops in the ma-Ac. Very few sUips 
occurred, so that the time corresponds closely to nctiial rimoiiig time 
anyway. 




Mban NuftfDDR or Sjiconds pi« Thial fob tub D:i'pb]ibnt 


Groups 
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time curves for these two groups than between the time 
curves for any two other groups. 

As regards transfer from the first to the second maze 
learned, there is relatively little evidence of this in the 
learning curves for errors, but there is very definite 
evidence, in the time scores of the first trial on retest, 
of transfer with respect to speed of running. 

3. Disiribulion of Scores in Different Periods of 
Learning. The data on the distribution of scores in 
different periods of learning are an important supple- 
ment to the learning curve. Liggett (18), for instance, 
has pointed out the relation of skewness to the problem 
of the choice of the best measure of central tendency, 
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and in the theoretical section of the present paper, in the 
discussion of the relation between range of ability and 
the reliability coefficients, a number of problems were 
discussed which can be illuminated to some extent now 
by the data on distribution of scores in the present 
study. 

The sigmas of scores in different periods of learning 
may first be considered. Figure 10 and Table S show 
the sigmas of forward-error scores in the different 
groups of the three trials. It can be seen from a com- 
parison with Figure 7 that there is a tendency for the 
degree of scatter to vary in proportion to the height of 
the learning curve. Thus Group II.LSc.D, which has 
the lowest learning curve, also tends to have the smallest 
sigmas. Groups IA.Sc.B and I.l.tSc.fi' have the next 
largest sigmas, as well as the next highest learning 

TABLE 5 

Variability of Forward-Error Scorrs at Different Periods 
OF the Learning Period as Indicated by the Siomas of For- 
ward-Error Scores for the Different Groups in Various 
Groups of Trials 


TrloU 


Group 

1 

2-4 

S-7 

8-10 

11-13 

14-16 

17-19 

LllSc.C’ 

1.75 

3.09 

3.9+ 

3.00 

2.3+ 

2.48 

1.91 



3.88 

4.75 

3.48 

3.22 

2.68 

1.89 

113.SC.E 

with 

1.59 

4.34 

4.57 

4.93 

5.19 



4 extreme 
rats dropped 


3.55 

2.72 

2.71 

1,49 



U,U.j4 


3.97 

3,10 

3.54 

3.33 

3.40 

3,02 

\AxLb.A and 

1.98 

3.03 

2,55 

1.72 

1,52 

2.49 

1,62 

r.+.d-c,/? 

combined 


3.78 

3.38 

3.67 

3,40 

3.53 

3,05 

IXScM' 

2.63 

2,81 

2.46 

2,26 

2.13 

2,38 

2.10 

WXScjy 

3M 

2.94 

1.83 

2.07 

1.43 

1.62 

1.25 

lXSc.D 

6.34 

3.49 

1.17 

.83 
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curve, and Groups I.I3.iS'c.G, IL13.iSc.G, andI.I3.iSc.£, 
and lA,Lb.A have the largest sigmas and highest learn- 
ing curves. 

One line on Figure 10 demands comment, namely, 
the line for “Groups lA,Lb.A and I.4.iSc.21 combined.'’ 
These two groups have been combined in certain cal- 
culations in order to illustrate how reliability coeffi- 
cients are affected by having unequal feeding for dif- 
ferent members of a group. These groups were run 
under similar conditions except for the fact that Group 
lA.Lb.A was fed much more liberally than the other 
group. The sigmas of the combined group are seen 
to be only slightly higher than the sigmas for the more 
variable group (Group lA.Lb.A), but, as will be 
brought out later, the reliability coefficients in every 
case but one are quite appreciably higher than for either 
group treated separately. These data, accordingly, are 
a good illudraiion of the point that only the most care- 
ful control of experimental conditions can yield re- 
liability coeff dents <which are not affected by systematic 
errors. 

The above table and figure also Include the data for 
Group I.IS.iyc.E with four extreme cases dropped from 
it, in order to show what effect a few extreme cases 
can have. It will be shown later that dropping these 
same four rats from the correlations of this group affect 
the correlation coefficients as seriously as they affect the 
sigmas in this case (although still not altering the main 
conclusions drawn from the experiment) . Such cases 
as these offer a rather baffling problem to the experi- 
menter, whether he is working on the problem of re- 
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liability or on the question of the influence on learning 
of some experimental factor. My own judgment would 
be that where a few individuals deviate markedly from 
the group (say, by three sigmas), that the experimenter 
should conclude that special factors must be operating 
in their case, whether he could specify what those 
special factors were or not, and hence should exclude 
them from his data. (In this instance I had been hesi- 
tant, beforehand, about using these rats, as the litter had 
not seemed to be in the best physical condition. How- 
ever, the other two rats of the litter made almost the 
best records of the group.) 

As regards the skewness of maze data, and the tend- 
ency for this skewness to increase the longer the train- 
ing is continued, the results graphed in Figure 11 may 
be taken as typical. Group 1.4 Jjb.A is the group with 
least tendency to skewness of all groups, and Group 
■I.4.5c.S is typical of all the remaining groups except 
for the fact that the distributions of scores in the re- 
maining groups (except I.13.)Sc.E) are even slightly 
more markedly skewed in the later periods of learning 
than are the distributions of Group I,4,jSc.B, 

The significance of these data is not to be underesti- 
mated relative to the problem of reliability. Alto- 
gether apart from the fact that such accumulations of 
zero scores very possibly affect reliability coefficients, a 
maze procedure which yields such an accumulation of 
zero scores after so few trials might quite reasonably be 
condemned merely for this one reason. Other things 
being equal, group differences will be more clearly 
brought out by measuring instruments which do not 



error* per tnof 


202 


OENBTIC PSYCHOLOGY MONOQRAPMa 



Err«ri ji%r trivi 

FIGURE U 

Distri noTioN OF Forward-Error Scorbs in Succbssivb Periods 
OF Thainino wjth Groups lA . Lb,ji and 

(W=3n 
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have an accumulation of zero scores, because on such a 
measuring instrument the diflFcrences of ability of the 
entire group will find expression, whereas with scores 
accumulating at zero, one no longer has any differential 
measure of a good part of the group. Therefore, it 
would seem quite probable that the reliability of the 
multiple-T maze would be increased if it were made 
considerably longer and more difficult, It would, of 
course, be possible with the 12-unit multiple-T maze 
to secure data which would not be markedly skewed if 
one dealt merely with the first seven or eight trials 
(with the experimental conditions the same as those 
used in the present experiment) , However, to so cur- 
tail the length of the training period would shorten the 
test in a way that might be expected to lower the reli- 
ability. With a longer and more difficult maze one 
might have the greater length of test which is so favor- 
able to reliability, and at the same time not be com- 
pelled to utilize markedly skewed data. 

4. Correlations between Scores on the First Trial 
and Scores on Subsequent Trials. Another problem 
closely related to that discussed above is the problem 
of the correlation between performance on the first 
trial (which, as we have seen, is markedly influenced 
by the number of retracing doors used) and scores on 
the subsequent trials. These correlations are of par- 
ticular interest because of the light they throw on the 
suggestion that has been made, particularly by Tolman 
and Tryon, that the data for the first trial or two should 
be discarded. 
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TABLE 6 

Correlations dbtwbbn Scores on thh 1'irst Trial and Scores 
ON SUBSBQUENT TrIALS 


Group 

Trials Involved In 
the correlaiions 

Corrcladons 
of forward 
errors 

Corrcladons 
of (ime 

BCorCB 

r.i3.s?.c 

i vj. 2-30 



ll.U.St.C 

1 2-20 


M±.l7 

Ui.Sc.E 

I Vi, 2-r 

.z2±.ir 


U.Lb.A 

1 «i, 2-30 


.23±.IS 

I.4,Sf.D 

1 Vi, 2-30 

.1+±.18 

.18±.17 

I.l.Sc.fl' 

1 Vi. 2-30 

— .02±.is 

— .06±,18 

lI.lXf.D 

1 V3. 2-30 

— 32±.16t 

.oo±.ia 

Ll.Sf.D 

1 vj. 2-10 

— 42±.l5 

— ,2l±,17 


*In this caie Uic size of the corrclaiion, 'vvhh one extreme case eHminatcd, 
is ,23±,I7, whiefi is more descriptive of the tendency with (he group as d 
whole, 

fWhen these same triafa are used, but with d// errore on Trial I vs, 
forward errors on Trials 2-30, the correlation Is ^.3S±.16. 

The correlations which have been made^ (sec Table 
6) include all of the groups where there was reason to 
suspect that the first run might have had some differen- 
tial effect among the members of a group. In this series 
of correlations the only figures which would seem to in- 
dicate any probable relationships are the correlations 
for Group D (II.l.iS'c.D and I.l.iJc.D). It will be re- 
membered that this group had the lowest learning curve 
of all the groups, except on the first trial, It may be, 
therefore, that with this group conditions were more 
nearly optimal for learning than with Group I.l.iSc.B', 
which was also trained with but one retracing door, and 
that for this reason appreciable negative correlations 
were found with it, but not with Group I.I.iSc.Z?'. These 

f ■ — 

r 11 of the originni calculations of this paper the staiulard 
error is used in preference to the probable error, so that henceforth 
in this paper r^,22±J7, for instance, is to be rend as equals .22, 
with a standard error of , 17 .” 
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negative correlations indicate that very probably with 
some groups the learning which goes on in the first trial 
is not a constant for all rats and therefore cannot be dis- 
carded without eliminating some of the significant data. 
When this is considered relative to the fact that scores 
on this first trial are largely determined by chance fac- 
tors, and that their sigmas are so very large that inclu- 
sion of the scores for the first day may be expected cer- 
tainly to obscure the true values in an unusual degree, 
we have a strong argument in favor of that type of 
maze with which the first trial will be approximately 
a constant as far as its effect on learning is concerned. 
It is very probable that this type of maze is secured by 
the use of doors to prevent retracing. 

5, Relation of Weight Changes to Learning Curves 
and Variability, Various studies have shown that speed 
of maze learning is related to strength of motivation. 
Hence, to understand the reasons for the different learn- 
ing curves secured in the present experiment, it is im- 
portant to have a careful analysis of the weight changes. 
The data on weight changes are presented in two dif- 
ferent ways in Figures 12 and 13. In Figure 12 the 
weight curves are plotted on a semi-logarithmic chart/ 
along with the curves of normal growth in weight de- 

®The advantages of this system of plotting on a semi-hgarithnuc 
chart are that equal vertical distances at any point on the chart 
represent equal percentages of gain or loss, and lines of correspond- 
ing slope anywJiere on the graph indicate that weight is falling off 
or increasing at the same percent in each case. This is a distinct 
advantage for the inspection of weight curves because of the fact that 
it is the Siime percentage cliangc in weight which will make two 
groups equal in motivation rather tlian the same absolute change in 
weight, in case the original weights are different. 
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rived from King (13). It is essential to consider 
weight changes not merely in terms of percentage of 
gain or loss over a certain period, but also by compari- 
son with the rate of growth that would probably have 
occurred with the same group under normal conditions. 
For example, the fact that the weight curve for Group 
I.13.5c.C was rising after the first 10 days docs not mean 
that this group was necessarily less strongly motivated 
on this maze than on Maze II where the weight curve 
fell steadily until almost the end of the training, inas- 
much as the two training periods fell in the periods of 
rapid and slow growth, respectively. 

In order to facilitate the comparison of weight 
changes in terms of their deviation from normal 
growth, Figure 13 has been added to supplement Fig- 
ure 12. Figure 13 shows the relative deviation of the 
different groups from the curve of growth which they 
probably would have had under normal feeding, judg- 
ing on the basis of the growth curves furnished by King. 
In other words, Figure 13 shows the relative degree to 
which the weights of different groups were reduced in 
comparison with normal weights, and it is probably 
safe to say that the heights of the different curves of this 
graph represent roughly the relative strength of moti- 
vation with the different groups. Comparison of this 
graph with Figures 7, 8, and 9 reveals that Group 1.4. 
Lb,A is the only one in which the rate of learning was 
mainly determined by the feeding program used. This 
was the group with which weight was least reduced. 
As regards the other groups, the amount of retracing 
permitted seems to have been more influential in de- 
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termining the rate of learning than was the feeding pro- 
gram used. Thus, of the youngest groups, I.13.)Sc.C 
had its weight cut more than did either Group I.4.iSc.j8 
or Group l.l, Sc.B\ and yet the learning of Group 1.13. 
Bc.G did not proceed as rapidly. Similarly, with the 
two oldest groups, I.13.)Sc.E and I.I.iSc.C, the learning 
was more rapid with the group permitted to retrace 
than with the group with the greater weight drop but 
with retracing prevented. Of the remaining groups, 
the most important comparison is between Group 
I-lS-iSc.B. and II.l.»Sc.Z), and here likewise the same 
situation is found. 

One important aspect of the control of weight, how- 
ever, cannot be deduced from an inspection of curves of 
means. We want to know not only what percentage of 
normal weight the grouf attained at any given point, 
but also whether the amount of variability which has 
escaped control affected the speed of learning of dif^ 
ferent individual rats. Hence, with those groups where 
there was most suspicion that such might be the case, 
the correlations were calculated between error scores 
and loss of weight. The error scores involved in these 
correlations were the total forward errors on Trials 2 
to 10, inclusive, for Groups II.l.jSc.JD and II.13.jSc-C^ 
and for Group lA2.Sc,E those on Trials 2 to 7 in one 
case and on trials 8 to 13 in another. {These trials 
were selected, in the case of the first two groups men- 
tioned, because of the fact that they covered the 
period of most rapid learning and also probably 
represent the most reliable period of learning.) The 
weight figure correlated was the percentage of the origi- 
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nal weight to which the weight had fallen by the fourth 
trial on the maze with each individual animal. The 
figure for weight at Trial 4 was estimated by averaging 
the weights on Trials 3, 4, and S in order to eliminate 
chance effects of daily fluctuations in weight. The re- 
sulting correlations arc given in Table 7. 

A negative correlation in these cases would mean that 
the rats whose weights had been cut more drastically 
tended to make larger scores, and vice versa. The only 
one of the correlations which appears possibly signifi- 
cant is the correlation for Group Il.l.jyc.Z!?. However, 
the correlation in this case is largely the result of the 
influence of a single extreme case, whose weight was 
not extreme in that it had been reduced to only 70% 
of the original weight rather than to 75%, as was the 
case with the group mean, but whose error score was 
over six sigmas away from the mean of the group. With 
this one rat omitted from the correlation, the figure for 
Group lI.l.iSc.D is r — — ,14±.18. This figure is prob- 

TABLE 7 

CoRRBI-ATlONS BBTWBBN- PbbCBNTAOB OF ORIGINAL WeIOIIT (JV) 
AND Errors on Certain Trials (Trials Involved Indicated 
ay Subscripts to E) 


Group 


”=+*2il±,l6 

a -10 

Oraup 

UA,Sc,D 

r.ff 

a-io 

Group 

Lli,Sc.E 

r „ .19±.17 

r.ff 

a-7 

Group 

l.li,Se.E 

r — .09±.17 

B-U 


(’With Group the correlation between percentage Ioub of weight 

on learning nnd percentage loan of weight on relearning wn» .20ib.l70 
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ably more indicative of the true relationship (or lack of 
relationship) which tended to exist between feeding 
and errors within different groups than is the correla- 
tion with this rat included. 

It therefore seems safe to conclude that the weight 
control used in this experiment was exact enough to 
guard against such differences in feeding (within a 
given group) as might have resulted in consistent dif- 
ferences in maze performance. 

6. Reliabilify Coefficients from the Present JCxperi- 
menl. Having presented the auxiliary material which 
is necessary to facilitate and safeguard the interpreta- 
tion of the reliability coefficients of the present investi- 
gation, we may now turn to a consideration of those co- 
efficients themselves. Four methods of calculating re- 
liability coefficients have been used; (1) correlation of 
scores on odd and even trials for different groups of 
trials; (2) correlation of scores on the first, second, and 
third groups of ten trials, etc.; (3) correlation of errors 
in various groups of three trials from the second to the 
nineteenth trial inclusive; and (4) correlation of scores 
on test and retest (with the same maze used on test and 
retest with one group, and with a mirror image of the 
first maze used in the retest with two other groups) , 
Where the reliability coefficients have been derived 
from two groups of trials which really constituted a 
unit of training (such as coefficients from odd vs. even 
trials or from Trials 1-10 or 11-20, for instance) the 
correlation coefficients could have been corrected by the 
Brown-Spearman formula for halving of the data; 

'2r 

R = 

l+r 
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(with the standar d errors of these calculated by the for- 
mula provided by Shea (22) 

following the procedure of Stone and Nyswander. 
However, as indicated before, I believe that the cor- 
rection would not be warranted because of the fact that 
tAe /<wo halves of maze data are not su-fjftcienlly inde- 
pendent to satisfy the necessary conditions for the use 
of this formula. 

Moreover, there is always a constant relation between 
r and R, so that the same relations existing among the 
ii’s would exist also in the uncorrected r's, and anyone 
who wished the R’s could readily calculate them. In 
general, the relationship runs thus: 


r 

R 

r 

R 

.10 

,18 

,55 

.71 

.15 

.26 

.60 

.75 

.20 

.33 

.65 

.79 

.25 

.40 

.70 

.82 

.30 

,46 

.75 

.86 

,35 

,52 

.80 

.89 

,40 

.57 

.85 

.92 

.45 

.62 

.90 

.95 

,50 

,67 

.95 

.97 


In evaluating the reliability coelficients from the 
present study, most weight should be attached to the 
coefficients from approximately the first 10 trials be- 
cause, as may be seen from Figures 7 and 9, with most 
of the groups the learning curve did not drop appre- 
ciably after the first 10 trials. The correlations of trials 
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after the first 10, therefore, may be fittingly spoken of 
as correlations of samples of the final level of attain- 
ment rather than as correlations of different samples of 
learning performance. It is interesting to note, how- 
ever, as will be shown, that even these correlations of 
errors on the later trials yield fairly high coefficients, 
both when correlated among themselves and when cor- 
related with the scores on the early portions of learn- 
ing. 

Another reason for attaching greater significance to 
the reliability coefficients of the early trials is that, if 
a maze procedure can be found which will give high 
reliabilities in this portion of the training, that maze 
procedure will, in general, be more useful for studies 
of animal learning than another with high reliabilities 
in some later period, since there is an important econ- 
omy of time in being able to use the first few trials as a 
basis for one’s experimental conclusions, rather than in 
having to run the animal for 20 or 30 trials to secure 
the scores of the latter portions of learning. 

a. Correlations of odd w. e^oen trials. The re- 
liability coefficients from correlations of odd versus 
even trials may be presented first. Their values are in- 
dicated in Tables 8 and 9, 

In the case of forward-error scores in the first 10 trials 
the three groups run with 13 doors had the highest reli- 
ability coefficients; the two groups with 4 doors, the 
next to lowest coefficients; and the two groups with one 
door, the lowest coefficients. The same relationship ex- 
ists for odd vs. even trials in the first 20 trials and in the 
first 30 trials, except that one of the 1-door groups sur- 
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passes one of the 4-door groups. For Trials 1 1-20 rela- 
tively little difference exists between the different re- 
liability coefficients, except for Group H.KiSc.!), For 
Trials 21-30 the picture changes markedly from any 
of the previous ones, except that the 13-door group is 
still highest, and one of the 4-door groups and one of 
the 1-door groups hold the intermediate values. 

In the case of reliability coefficients from the time 
scores, the same general relationship holds, except that 

TABLE 8 

RBtrADILlTV CoEFPlCJBNTS PROM ErROR ScORQS ON OoD VS. EVBN 

Triam in Different Portions of Tiuinino 
(The figures in parentheses arc nmk-diflcrencc correlations; the other 
(igurcs arc product-moment correlations and their sigmas.) 

CorrelatlDn* from odd vi. 
even trjoli In different portiont of tralninK 


Group 

l-io 

11-20 

21-30 

1-20 

1-30 

1.13.$r.C 

.72±.0« 

(.70) 

.70±.09 

(.66) 

.«6±.(H 

.83±.05 

(.82) 

.88±,04 

II.13.Sf.C 

.74±.08 

.90±,03 


.BB±.04 



.78±.07 

(.61) 

.88±.04 




lAXb.A 

.37±.13 

(.34) 

.62±,10 

(.«) 

.61±.10 

.73 ±.07 
(■71) 

.74±.07 

lA-ScD 

.6a±.ii 

(.S3) 

.66±.t0 

(.51) 

.I7±.ir 

.58±,12 

(.49) 

.6l±.ll 

U.Sc.B‘ 

.27±,I7 

(.28) 

,71±.0J 

(.66) 

,53±.13 

.60±,11 

(.51) 

.65±.10 

lI.LSc.D 

.H±.l7 

(.31) 

.2B±.16 

(.16) 

,l0±.lg 

.37±,tS 

(.07) 

.46±.14 

IXSe.D 

— .44±.1+ 






■WiUi Group l.lS.Sf.fi the first correlAtlon Is ot the odd and even trials 
In Trial* Z-7, the second corrclaiion Is for Trials 8-Z3. With the four 
extreme cases dropped these correlations become .57 ±.11, ond .34±.t6, 
respectively, 
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TABLE 9 

Reliabilty Cobfficibnts from Time Scores on Odd w. Even 
Trials in Different Portions of Training 


Correlations from odd 

even trials in different portions of training 

All trials 
except 


Group 

1-10 

11-20 

21-30 

1-20 

1-30 

Ihefimt 

I.13.SC.C 

,77±.07 

.68zt.09 

.72±.08 

.B0±.06 

80, ±,06 

.79±.07 

ii.n.sc.c 

.50±.13 

.asdz.os 


.76±.07 


.84±.05 


.Sli.Ofi 






lA.LbA 

,57±.ll 

.79rt.06 

.ao ±.06 

,74±.07 

.a6±.04 

.86±.04 

IA.SC.B 

.n±A6 

.77±.07 

.S0±.13 

.41±d5 

.S9±.12 

.ai±.06 


— .0+±.18 

.+1^.15 

.U±Ai 

.lOdr.lfl 

.20±.17 

.51±d3 


--.20±.l7 

.lOi.lS 

.15±.17 

,02±.18 

.03±.18 

.29±,16 

J.l, Sc.D 

— ,14±.17 





.21±,17 


•The first oF the two correlations for Group hl2*Sc,E is for Trials 2'7, 
and the second For Trinis 8-13. With the Four extreme cases of this group 
droppedj the two correlations become and .76:t,07, respectively, 


the 1-door groups have definitely lower reliability co- 
efficients than with the error scores, while the time- 
score reliability coefficients of the other groups are 
about the same. Also, it may be noted that with the 
three groups with most rapidly falling learning curves 
(Groups llA.Sc.D, lA.Sc.B, and I.1.5'c.l3') dropping 
the first trial from the calculations decidedly improves 
the reliability of time scores. 

b. Correlations of groups of ten trials and of fif- 
teen trials. The reliability coefficients from correla- 
tions of scores on the different groups of 10 or 15 trials 
may next be considered. Their values are indicated in 
Tables 10 and 11. 

In the case of the error scores, more similarity be- 
tween the groups exists with the reliability coefficients 
derived in this manner than between those derived in 
any other manner. Particularly in the correlations be- 
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TABLE 10 

Rblmbjlity Cobppjcients from Forward Errors in Different 
Groups of Ten and Fifi'ern Trials 


Correlaiionfl from diffcreni groups of 
iriuU 


Group 

MO V/, 
11'20 

H-20 -V/. 
21-30 

MO W. 
21-30 

U15 vs. 
16-30 

l.lJ.Sf.C 

II.lJ.So.C 

■SOtn 


.30^,16 

.41±.14 

U.Lb./l 

.37±,1J 


.I8±,15 

.43 ±,13 

IA.SC.B 


,59a:.l2 

.5l±.13 

.6i±.l0 

U.Se.B' 

.46±.1+ 

.+9±.H 


.S8±.IZ 

ll.Ue.D 

.40±.IS 

.60±.U 

— .1G±.17 

.40±.I5 


RELlABIUTy 

TABLE 1 1 

COBPFJCIBNTS FROM TiMB SCORI'.S IM DIFFERENT 

Groups op Trials 

Group 

l-IO vt. 
11-20 

Corj-elaiions from tltfrcropi groups of 
trinlit 

tl,2Q vs. f-10 vs. 2-10 vs. 

21-30 21-30 U-20 

2-10 vs, 
21-30 

1.13, Sf.C 

.22±.16 

,46±:.H 

.17±.l7 

,27 ±*16 

.3g±,lS 

U.13.Sf.C 

.40±.1+ 



.39±,1S 


i.4.ifr.// 

.69±M 

,6J±.09 

.64±,09 

M±M 

,57±.U 

I,4.s^,n 

.2++.17 

.6+±:.U 

M±.H 

.46±,l+ 

.S5±J3 

r.i.yf,B' 

.40±.1J 

,63±ai 

.4I±.1$ 

.S4±.I3 

.58±.12 

ir.i.Si:.n 

.34±.1() 

.38±.15 

.03±.1B 

— .2I±,17 

— .20±,17 


tween the second and third group of 10 trials, and to 
some degree between the first and second group of 10 
trials, the coefficients from the different groups are ap- 
proximately the same. The lowest correlations by this 
method are for Trials 1-10 vs. 21-30 for Group II. 1. 
Sc.D and l.^.Lb.A. It is not surprising that with 
Group II.l.iS'c.Z) the correlation between Trials 1-10 
and 21-30 is so low, because in this group more than in 
any other there was an extremely marked accumula- 
tion of zero scores in the latter portions of the training. 
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With Group l.^.Lb.A, however, the low correlations 
of 1-10 vs, 11-20, 1-10 vs, 21-30, and 1-15 vs. 16-30 are 
strong evidence against the reliability of the procedure 
used with this group. Since Group lA.Lh.A was the 
one with the high learning curve (secured by more gen- 
erous feeding than was given any other group) , insofar 
as the distribution of scores was concerned, one would 
have expected relatively high reliability coeflicients 
with this group. The other group run with four re- 
tracing doors (Group 1.4.15^. 5) had a more restricted 
feeding, and with this group the highest reliability co- 
efficients in each case are secured, except for Trials 
1-10 vs, 11-20. It would seem, therefore, that strong, 
rather than weak, motivation increases the reliability of 
error scores. 

In the correlations of time scores, there seem to be 
no very consistent and significant differences between 
the different conditions except for the unexpected ex- 
cellence of Group lA.Lb.A, 

Attention has been called before to the fact that, with 
most of the groups, the learning curve had virtually met 
its lowest value by the tenth trial and that scores after 
that point could more appropriately be spoken of as 
measuring the final level of attainment achieved than 
as measuring learning proper. The above correlations, 
however, indicate that, to an appreciable extent, per- 
formance during this final level of attainment is corre- 
lated with performance in the period of rapidly de- 
creasing errors. Just why this is the case is not clear 
from the data; but the problem is important for fur- 
ther investigation. 
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c. Correlations from groups of three trials. The 
correlations may next be considered which were calcu- 
lated. from the different groups of three trials. Inter- 
correlations were calculated between all groups of three 
trials from 2 (o 19 inclusive for all the groups. These 
arc presented in Table 12. It is to be expected that the 
reliability coefficients calculated in this manner will 
be somewhat lower than the coefficients calculated from 
groups of 10 or IS trials because of the fact that shorter 
tests, in general, do tend to give lower reliabilities. 
This system of measuring the internal consistency of 

TABLE 12 

Imtbrcorrblations betwbbn tmb Forwarp Errors op Dipfbrbnt 
Groups of Turbb Trials 

Group 113.SC.G; N=:34 


Trials 

2-+ 

S-7 

g-10 

ll-U 

14-16 

5-7 

.35 





8-10 

.25 

-72 




11-13 

.27 

-53 

.59 



14-16 

.33 

.(i3 

-56 

.6a 


17-19 

.07 

sl7 

,02 

.50 

.46 


Group I1.13.Sc.C 


Trials 2-4 5-7 8-10 11>13 14-16 


5- 7 

6- 10 

U-13 

14-16 

17-19 

.60 

.39 

.32 

.22 

.14 

.69 

.46 

,40 

.27 

.77 

.72 .>0 

.56 .77 .65 

Group LlB.Sc.E (with 40 days of rest and 5 days' straightaway 


training between trials 7 and 8)j 

Trials 

2^ 

5-7 

S-10 

5-7 

.74 



8-10 

.66 

.88 


U-13 

.64 

.81 

s84 
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TABLE 12 {continued) 

Intercorrblations between the Forward Errors of Different 
Groups of Three Trials 

Grout> L4.LbJ; N=41 


Trials 


5-7 

8-10 

11^13 

14-16 

S-7 

.30 





8-10 

.14 

.49 




11-13 , 

.04 

.37 

.42 



14-16 

—.04 

.24 

.57 

.37 


17-19 

.06 

.16 

.42 

.48 

.50 


Group I.4,Sc.I}; N=31 


Trials 

2-4 

5-7 

8-10 

11-13 

14-16 

5-7 

.58 





8-10 

.43 

.48 




11-13 

.07 

,09 

.21 



14-16 

.02 

,20 

.41 

.33 


17-19 

.29 

.35 

.58 

-18 

.46 



Group N=31 



Trials 

2-4 S-7 8-10 

11-13 

14-1 d 


S-7 

.24 





8-10 

.13 

.46 




11-13 

♦46 

.12 

.48 



14-16 

.33 


.38 

.64 


17-19 

.32 

—.27 

.30 

.65 

♦64 


Group II.l.Sc.D; N~32 


Tdalfl 

2-4 

S-7 

8-10 

11-13 

14-16 

5-7 

.47 





8-10 

.25 

.42 




11-13 

.37 

.49 

.61 



14-16 

.00 

.22 

.36 

.37 


17-19 

.25 

.19 

.22 

.05 

.03 


Group I.l.Sc.D 


Trials 2-4 i-7 


5-7 .09 

8-10 .21 .00 
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maze performance, however, has distinct merit in that 
it throws light on the question of what forces arc op- 
erating to produce consistency between different groups 
of trials and between portions of the same series of 
trials. Thus, it can be seen from the tables that the 
highest coefficients are secured by the closely adjoining 
groups of trials and the lowest coefficients, in general, 
by the most remotely removed groups of trials. Hun- 
ter (7) called attention to the fact of this characteristic 
with his correlations of Vincent scores, and Tryon (29) 
has also noted the same characteristic with his data. 
Tryon has suggested that the proper interpretation of 
this characteristic of the data is that, apparently, dif- 
ferent portions of the course of learning involve some- 
what different functions, and seems to consider these 
high coefficients from closely adjoining trials to be 
fairly accurate measures of reliability. However, there 
is just as much evidence, or more, that this character- 
istic is to be regarded rather as evidence that various 
irrelevant factors (factors external to maze-learning 
ability, as such) operate to secure high correlations 
between closely related portions of the maze perform- 
ance. These factors are more in the nature of system- 
atic errors, e.g., position habits, raising the reliability 
coefficients, than they are of anything else. 

In trying to utilize these correlations of groups of 
three trials as a basis for comparing the reliabilities 
found with different groups, the large number of co- 
efficients involved makes it imperative to reduce the 
data to more manageable dimensions. Consequently, 
average values have been secured for the different 
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groups by the procedure of averaging the coefficients 
of alienation (iC=Vl — r*) for the different groups. 
The advantage of this procedure over that of merely 
averaging the reliability coefficients is that reliability 
coefficients cannot be evaluated directly in terms of 
their relative size. A correlation of .80 is not twice as 
meaningful as one of .40, but more than twice as mean- 
ingful. The coefficients of alienation correct for this. 

Table 13 shows the mean coefficients of alienation of 
the intercorrelations of each group of three trials with 
every other group of three trials, and the grand means 
for each group of rats and each group of trials. It is 
probably safe to say that \ve arrive by this means at a 
fairly satisfactory means of estimating the relative re- 
liability of the different procedures used, insofar as the 
correlations from groups of three trials can indicate re- 
■ liability. The groups of rats which yielded the highest 
reliabilities, judging from this table, are the 13-door 
groups, with the 1-door groups next, and the 4-door 
groups lowest. (The more reliable a test the lower its 


TABLE 13 

Mean Coefficients of Alienation of All the Coerelations 
OF Each Group of Three Trials with All Other Groups 
OF Three Trials 


Group 

2-+ 

5-7 

Groups of trials 

8-10 11-13 14-16 

17-19 

Mean for 
all trials 

l,U,Sc,C 

.962 

.848 

,U0 

.844 

.636 

.847 

.893 

U,n.Sc.C 

.926 

.856 

.799 

.707 

.755 

.835 

.813 

lA.Lb,/! 

,988 

.943 

,S99 

.928 

.917 

.928 

.934 

lA.Sc.D 

.9+8 

.933 

.898 

,976 

.9+5 

.917 

.93 6 

U.ScM' 

,948 

.962 

.926 

,857 

.881 

.876 

.908 

UA.S^.D 

.950 

.92+ 

,809 

,904 

,859 

.984 

.905 

Mean iot 

all groups 

,95+ 

.9U 

,865 

,869 

,865 

.915 
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coefficient of alienation.) As regards the different 
groups of trials, it seems that the groups 8-10, 11-13, 
and 14-16 seem the most reliable, but this apparent re- 
sult may be the expression of the fact that these trials 
occur in the middle of the training series and are more 
linked by transfer effects and systematic errors to other 
groups of three trials than are the groups of three trials 
at either end of the training period. 

In the course of this discussion we have suggested 
a number of times that poor experimental control could 
raise reliability coefficients in a manner that would not 
at all be justified. To demonstrate this, reliability co- 
efficients have been secured by combining two of the 
groups which were run under the same maze condi- 
tions (four doors were used with each group) , but with 
different feeding programs. Groups lA.Lb.A and 1.4. 
Sc.B were so combined, and reliability coefficients were 
calculated from the combined groups. It will be re- 
membered that Group lA.Lb,A had been fed rather 
liberally, and Group \A.Sc.B rather scantily; hence 
the combined group gives much the same effect as 
would be secured if individual rats in the large group 
had been fed differently. The reliability coefficients 
from the combined group (see Table 14) are higher 
in every case than the correlation coefficients of either 
group separately, except for the correlation of trials 
2-4 versus 5-7. This raising of the correlation is ob- 
viously the result of irrelevant factors existing with this 
combined group. 

Still another influence which may seriously affect 
the accuracy of reliability coefficients is the extension 
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TABLE 14 

Correlations of Errors on Different Groups of Three 
Trials for Groups \A.Lb.A and lA.Sc,B Sefarately, and 
FOR These Two Groups Comrinbd 


Triala correlated 

h^.Lb.^ 

Corrdniions 

IA.SC.B 

lAXb.^ 

and 

lA.Sc.B 

2-4 •yi, 5-7 

,29 

.58 

AS 

5-7 vs. 840 

.49 

AS 

M 

8-10 vs. 1143 

.42 

.2L 

.62 

11-13 vs. 1446 

47 

.33 

.55 

14-16 vs. 1749 

.50 

.46 

.63 


of the range of ability in a group by the inclusion of ex- 
treme cases. In product-moment correlations these 
affect the correlation in proportion to their extremeness. 
(More than one correlation has been cited throughout 
the previous pages in which a single case sometimes 
raised correlations as much as 20 or 30 points.) The 
product-moment correlation does not require normal- 
ity of distribution of itself, and where these extreme 
cases are the result of the same forces affecting the rest 
of the group, except that the forces arc here operating 
on a larger scale, such extreme cases should be allowed 
their unusual influence upon the correlations. 

In maze experiments it is very doubtful whether such 
is the case. As an illustration, Group I.13.>Sc.JS may be 
used. From Table IS it may be seen that the four ex- 
treme rats have had great weight in determining the 
correlations with this group, and that their omission 
leaves quite a different picture. It may also be seen 
that if rank-difference correlations were used through- 
out, a perhaps much more correct estimate of reliability 
would have been secured (with the total group) than 
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TABLE 15 
Group I.13.Se.E 

Correlation coefficients ciilciilated by the product-moment formula 
nnd by the rank-difference formula, with and witlitnit four extreme 
cases included. The upper coefficient in each pair is with the full 
g;raup of 33 rats, and the lovs'cr coefficient witli hut 29 rats. 'Llie 
rank-difFcrcncc coefficients arc in parentheses. 


Trlnia 


5-7 

8-10 

5-7 

.74 (.60) ' 

.S3 (.41) 



8-10 

.66 {.S6) 

.88 (.54) 



.34 (.35) 

.44 (.41) 


lt-1.3 

,64 (.39) 

.81 (.58) 

.84 (.42) 


.22 (.10) 

.26 (.33) 

.14 (.15) 


if the product-moment correlations had been used. (It 
might be added here that rank-difference correlations 
have also been calculated for the odd vs, even trial cor- 
relations of Table 7, but that there Group I.I3.5C.25 
was the only group whose correlations were very defi- 
nitely lower, and the relationship between the different 
groups was left as with the product-moment correla- 
tions.) 

(1. Correlations of scores on different mazes. The 
final set of reliability coefficients to present are those de- 
rived from tests and retests. Data are available on 
Group C and Group D (see Table 3). Group D was 
tested first for 30 trials on Maze II, and then, after 30 
days of rest and 10 days of a second preliminary train- 
ing, given 10 trials on Maze I, with but one door used 
in each maze. Group C was tested first on Maze I and 
then, after a similar interval, on Maze II, with 13 
doors used with each maze. Group lA3,Sc.E was 
given 7 trials on Maze I, then 38 to 40 days of rest, 
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followed by 5 days of training on the straightaway, and 
then 6 more trials on the same maze* During the in- 
terval betAvecn the two periods of training the rats were 
fed an unlimited amount of dry food, and occasional 
bread, milk, and green food, in order to bring the 
weights of all of them back to normal and thus reduce 
to a minimum the danger that differences in feeding 
would last over from the test to retest. Restriction of 
feeding and weight drop began with the beginning of 
the second preliminary training in a manner similar to 
that used on the first preliminary training. Group G 
was also tested on the elevated maze. 

The correlations were determined, in these cases, 
using those groups of trials which on various general 
grounds might have been thought to prove most en- 
lightening. Thus, with time scores the first trial was 
omitted from the correlations because of the previous 
demonstration that its omission improved the reliability 
of time scores with most groups. The resulting corre- 
lations are given in Tabic 16. 

It will be seen that there is quite a difFcrence between 
Group C and Group D with respect to the correla- 
tions between the performances on Mazes I and II. In 
the case of Group D no one of the correlations is sig- 
nificant; in the case of Group G the correlations of 
scores on the two mazes, though appreciably lower than 
corresponding correlation coefficients from cither the 
test or retest, are still fairly high. 7^hc intcrcorrelations 
of Maze I and Maze II with Group D similarly re- 
flect the reliability coefficients from Maze I and Maze 
II separately, for the reliability coefficients for Group 
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TABLE V6 

Correlations ubtwbbn Different Mazes, with Internal 
Consistency Correlations prom the Same Groups for 
Comparison 

(See Table 12 (or ndclitionnl comparative data) 


Graup 

l.l}.5r.C vr, ll.lSc.D v/, 


Trials Involved 

lU^,Sc,C 


(except iKet iKe first 
trial is excluded from 

r's from 

r's from 

r's from 

r*B from 

time 

error 

time 

error 

dll time correlations) 

scores 

scores 

scores 

scores 

Teshretest corratations) 





1-30 on I vs* 1-20 on II 

.3S±.15 

.71±.0g 



21'30 on 1 vj. 11-20 on II 

21-30 on I vj* U-20 on Ilj 

.69±.09 

,58±J1 



but with one exlrcmc case 
dropped 

.34d;.15 




U-ZO on 1 vs* 11-20 on 11 
Trlah preceding severe norm 


,56±,12 



on I Vs, trlnU preceding 
Hovere norm on II 


.30±.16 



Trials preceding moderalo 





norm on I trials pre- 

ceding moderate norm on 11 


,%6±,iS 


.28±.16 

1-30 on n vs, 1-10 on 1 



“.02±.I8 

— ,02±,t8 

1-20 on II vs, 1-10 on 1 




— .07±.18 

MO on n vs, MO on I 

21-30 on n vs. 2-10 on I 
Inlernsjl cansisttney 



— .08±.lll 


correiaiionst 

1-30 on lesti odd even 

Odd vs, even on retest (20 

.79±.07 

.88±,0+ 

.29±.t« 

.46±.U 

trials for and 10 

Ur U,Sc,D) 

.76±M7 

.B8±.04 

.Sl±.i7 

— ,44d:,I+ 

MO vs, 11-20 (test) 

.2i±A6 

,S6±.12 



1-10 vs, 11-20 (retest) 

.♦0±.14 

.S0±,13 



1-10 VI, 21-30 (test) 

.17±.17 

JOrb.l^ 




D on the first maze learned were generally the lowest 
coefficients of all the groups, and on Maze I, on its 
retest, the internal-consistency measures were even 
lower (see Tables 7, 8, and 12), An analysis of the 
records of Group D on this second maze, however, 
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makes one suspect that the training on the first maze, 
with its reverse pattern, had a disturbing rather than 
a stabilizing effect on the learning of the second maze. 
This would seem the case because of the larger mean 
number of errors on Trial 1 on the second maze than 
on the first maze and the considerably greater variabil- 
ity of scores on this second maze on the first trial than 
on the first trial of the first maze. Hence the extremely 
low intercorrelations between the two mazes with 
Group D do not necessarily indicate that the reliability 
of the experimental data for Group D on the first maze 
is lower than one would have estimated from the re- 
liability coefiicients calculated on that maze alone. 

The correlations between Group G on Maze H and 
on the elevated maze are negligible. The correlation 
between forward errors in Trials T30 on II and all en- 
trances into blinds on the elevated maze was ,20=t:.17; 
and the correlation of the number of trials required to 
satisfy the norm of learning of three successive error- 
less runs, on the two mazes, was — .07=t. 18. These re- 
sults with the elevated maze do not necessarily indicate 
a low reliability for this type of maze.® It may be that 
what accounts for the low reliability indicated for this 
maze is in good part the procedure which was used of 
giving the trials one after another till learning was 
completed. Or the reason may have been that on the 
elevated maze the rats were run at more or less irregu- 
lar times in the evenings of the day on which they had 
met the norm of learning on Maze II. After their 


®See the article by W. R. Miles (20), for comments on some 
probable advantages of elevated mazes. 
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previous regular running in the afternoon, this may 
have caused some disturbance. 

e. Gorrelatiom of test and retest. The correla- 
tions reported above of the same groups on Mazes I 
and II may be considered to be essentially test-retest 
coefficients, and to have all the advantages indicated in 
the early part of this paper for this method. However, 
the data from Group I. IS-iSc-JS is here presented sep- 
arately because it conforms exactly to the test-retest 
formula. The internal-consistency coefficients from 
this group are gathered together in Table 16 for com- 
parison with the test-rctest coefficients. Figures are 
presented both for the entire group and for the group 
with the four extreme cases omitted. (It will be re- 
membered that an interval of 43 to 45 days separates 
Trials 7 and 8.) 


TABLE 17 

TdST-RbTBST CoEPFtClBNTS FROM GrOUI* I.i3.Sc.7s, WITH In- 
TERNAt-CoNSISTBNCY CORRELATIONS FROM THE SaME GrOUP 

FOR Comparison 



WiUi the four extreme 
cases oTTiiltcd 

r's from /a from 

With the entire 
Kronp 

r’s from r's from 

Trinla corrclntcd 

time 

scores 

error 

scores 

lime 

scores 

error 

scores 

Test-relest: 

2-7 vs. 8-13 

2-4 vs. 8-10 

2-4 vs. n-13 

5-7 vs. B-iO 

5-7 vs. 11-13 

.61rb.U 

.J3±:.u 

.34±.i<; 

.22±.IS 

.44±.l5 

.26±.I7 

.74±.oa 

.83rh.OS 

.66±.l0 

.fi4±,I0 

.88±.0+ 

.Bl±.06 

Intcrnol-conBisiency 

correlations; 

Odd vs. tvci\ in 2-7 
Odd vs, even in 8-13 
2-4 vs. 5-7 

8-10 vs. H-13 

p29±.L6 

.7Qdz.Q7 

.S7±.ll 
.3+±.l6 
.S3±.n 
• W+.Ig 

.gi±.oe 

.77it.07 

.78±.07 
.8g±.0+ 
.7+±.Og 
.8-1 ±. 05 
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It may be seen from the above table that the test- 
retest correlations compare rather favorably with the 
analagous internal-consistency correlations, just as the 
correlations of the scores of I,J3.)Sc.O with the scores of 
IIJS-iSc.G compared favorably with the internal-con- 
sistency correlations. This result is rather in disagree- 
ment with the findings of Heron (6) and makes it seem 
probable that the intervals of time used by Heron were 
so long as to permit changes of relative vigor and health 
which would lower the correlations. 



GENERAL SUMMARY AND CONCLUSIONS 


The theoretical discussion of the present paper has 
led to the following conclusions: 

1 . To measure the reliability of a difference between 
group means in maze experiments, the formula which 
should be used is the customary one, 


or, where it is possible to correlate the scores of the two 
distributions, the formula, 

• 3 > I J I a 

The formula recently proposed by Tryon for the re- 
liability of a difference, 



is not the proper formula to use in this connection, be- 
cause it rests on the assumption that the standard devia- 
tion of found means around the true mean would be 
constant regardless of the reliability of the measuring 
instrument and because this assumption is not in any 
sense correct. 

2. The value of reliability coefficients relative to 
maze experiments is chiefly as a means of estimating 
the relative usefulness for experimental work of dif- 
ferent maze patterns and procedures. The reliability 

[2301 
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coefficients arc not required as a means of estimating 
the reliability of a difference between group means, 
though certain other precautions are much more im- 
portant than is at present generally recognized — par- 
ticularly the necessity of securing randomness of 
sampling and of making the conditions strictly com- 
parable between the compared groups, 

3. When maze scores are as markedly skewed as is 
generally the case with current maze proccdureSj the 
median is a more reliable measure of central tendency 
than is the arithmetic mean. 

4. The reliability coefficients from maze experi- 
ments cannot be interpreted as indicating directly the 
reliability of tliose experiments, because 

rt. The scores correlated to secure reliability cn- 
efficients in maze experiments do not satisfy the cri- 
terion of being independent measures of the same thing, 
Some methods of calculating maze reliability coeffi- 
cients, however, more nearly approximate this ideal 
than do others, and accordingly are to be advised — 
especially the methods of enrreJating scores on test and 
retest, on different groups of trials within the test or 
retest, and possibly on odd and even trials. The re- 
liability coefficients from correlation of scores on odd 
and even blinds, or on the first half vs. the second half 
of the maze, however, are to be recognized as seriously 
objectionable. 

b. The size of reliability coefficients from maze 
experiments is generally significantly influenced by the 
degree of heterogeneity of ability ot the group tested. 
Moreover, there is at the present time no statistienJ for- 
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mula which can be used to make possible the accurate 
comparison of reliability coefficients from groups of 
different degrees of heterogeneity, The formula com- 
monly used for this purpose with educational and in- 
telligence tests, 

a VT— 

25 \/l — r 

is not applicable because of the fact that in maze ex- 
periments the standard errors of scores are not uni- 
form from group to group, but vary greatly in response 
to the differences in the factors governing the height 
of the learning curve. Accordingly, a particularly im- 
portant aspect of technique in experiments aiming to 
determine the relative reliability of different maze pro- 
cedures is the equating of groups with regard to hetero- 
geneity of ability. 

5. The interpretation of the significance of maze 
scores awaits the empirical determination of the validi- 
ties of the same. 

In the experimental work, the object has been to de- 
termine the influence on the reliability of maze ex- 
periments of various features of maze structure and 
maze procedure. In particular the objects tvere (1) to 
determine the influence of the use of various numbers of 
doors to prevent retracing, (2) to determine the relation 
of methods of feeding to reliability, and (3) to secure 
more information on test-rctest correlations with this 
maze. For these experiments the multiple-T maze of 
Stone and Nyswander was used, and their procedure 
was closely followed except that for preliminary train- 
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ing a short straightaway maze was used rather than a 
problem box, and that a special room around the maze 
was used to shelter the rats from distracting sounds 
during runs. Six groups of from 31 to 41 rats each were 
used. 

The results of the experiment are presented not only 
on reliability, but also on the learning curves, variabil- 
ity of scores, and effects of differences in feeding, the 
reason for presenting all these data being the aid which 
this material gives in interpreting the reliability coeffi- 
cients secured. It was found that, first, the learning 
curves after the first two trials were much lower than 
those of Stone and Nyswander’s groups (see Figure 7). 
Comparison of procedures would seem to indicate that 
the difference in preliminary training may have ac- 
counted for this. Secondly, the variability of the scores 
was directly related to the height of the learning curves, 
being greater with the higher curves (see Figure 10 and 
Table 5 in comparison with Figure 7). Thirdly, after 
about the first seven trials the distribution of scores be- 
came more and more markedly skewed (see Figure 11). 
This skewness in itself indicates the need for longer and 
more difficult mazes to secure the maximum reliabil- 
ity. Fourthly, when the scores on the first trial were 
correlated with the scores on subsequent trials, only 
negligible correlations were found, except with two of 
the 1-door groups (see Table 6). With these two 
groups there were found correlations of — .32±.16 and 
— .42±.l5 between forward errors on the first trial 
and forward errors on subsequent trials. These cor- 
relations indicate that where retracing is permitted, an 
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appreciable and variable amount of training may go 
on in the first trial, and that, consequently, the first 
trial, with such a maze, cannot be dropped as though 
it were roughly a constant. When we add to this ob- 
servation the further considerations that the time and 
error scores on the first trial are largely determined by 
chance and that in a maze where retracing is permitted 
these scores on the first trial are so large and variable 
as to dominate the total scores, it can be clearly seen 
that there are serious objections to permitting retracing. 
Fifth, practically zero correlations were found behveen 
error scores and percentage loss of weight (see Table 
7) , demonstrating adequate control of feeding and mo- 
tivation by the system used. By combining scores from 
a liberally fed and a scantily fed group, however, it was 
demonstrated that the reliability coefficients could have 
been unwarrantedly high it there had been a less care- 
ful control of feeding (see Table 14). 

Reliability coefficients were calculated for both time 
and errors from odd vs, even trials, groups of 10 or 
15 trials against one another, and (for errors) by the 
correlations of various groups of three trials within 
Trials 2-19. To secure test-rctest data, two of the 
groups were run, after 30 days of rest and unrestricted 
feeding after training on a first maze, on a second maze 
which was a mirror image of the first, and a third group 
was given further training on the same maze after a 
first training of 7 trials and an interval of 38 to 40 days 
of rest and unrestricted feeding. 

In general, the reliability coefficients from this study 
are definitely lower than those found with the same 
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maze pattern by Stone and Nyswander and by Heron. 
This may have been the result, to some extent, of the 
removal, especially by the careful control of feeding, 
of certain systematic errors which may have entered 
into their work to raise the correlations. A more prob- 
able explanation is that whatever forces produced the 
lower learning curves of the present study also caused 
the lower reliability coefficients, as these forces pre- 
sumably would operate in the direction of making the 
problem essentially so much easier that it no longer had 
the same capacity for differentiating between the dif- 
ferent animals. 

In comparing the reliability coefficients from dif- 
ferent groups to determine which particular variations 
of maze procedure and which type of maze scores 
affo'rd the most reliable measures, it is to be noted that 
there are some contradictions between the reliability 
coefficients calculated by different methods. Also it 
has been necessary to interpret the correlations with 
caution because of the tendency in some cases for ex- 
treme scores to dominate the correlations. What I 
have sought, therefore, as the basis for my conclusions, 
has been the observation of consistency of trend, to- 
gether with inspection of the scatter diagrams to de- 
termine which correlations seemed dominated by a few 
cases, and which seemed more the expression of a ten- 
dency in the entire group (and accordingly more de- 
pendable) . 

When the data are so considered (see Tables 8, 10, 
12, 16, and 17 for reliability coefficients fiom error 
scores, and Tables 9, 11, 16, and 17 for rcliabdity co- 
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efficients from time scores), the following conclusions 
are indicated: 

1. With the multiple-T maze, prevention of re- 
tracing with 13 retracing doors gives higher reliabilities 
than when 4 doors are used, and this in turn gives defi- 
nitely higher reliabilities than the use of a door only at 
the food box. 

2. Strong motivation seems to yield more reliable 
results than moderate motivation. 

3. Time scores are somewhat less reliable than error 
scores. 

4. The reliability of time scores is increased if the 
first trial is not included in the calculations, except per- 
haps with groups where retracing is limited (see Table 
9). 

5. Judging from correlations of test and retest on 
different mazes, scores in terms of trlals-to-learn are 
less reliable than error scores (see Table 16). 

6. The correlation of different parts of the same 
period of training (whether by correlation of odd and 
even trials, or of groups of 3, 10, or IS trials) tends to 
give higher reliability cocflicicnts than are actually 
warranted by the reliability of the experiment. Three 
lines of evidence bear on this: («) tcst-rctest correla- 
tions on different mazes yielded lower figures than the 
internal correlations from either test or retest (see the 
figures for Group I.13.i5c.C, Tabic 16, as Illustrating 
this, even though the internal correlations have not been 
corrected for halving of the data; (i) the size of the 
correlatipns of different groups of three trials with each 
other ^-aried quite consistently and in an inverse man- 
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ner with the number of trials separating the trials cor- 
related (see Table 12) ; and (c) in contrast with the 
above, correlations of test and retest on the same maze 
(with one group thus tested) yielded coefficients that 
compared well with the internal -consistency correla- 
tions of the same group (see Table 17). 

This sixth conclusion is different from the position 
taken by Tryon and some other investigators, who 
would tend rather to accept the reliability coefficients 
from maze experiments at their face value. Such a pro- 
cedure would be justified only if such coefficients were 
derived from scores that were really independent 
measures of the same thing — only if the scores corre- 
lated were not linked by “systematic errors” (i.e., by 
factors irrelevant to basic maze-learning ability — fac- 
tors such as motivation differences, differences in emo- 
tional conditioning to handling or to the apparatus, 
etc.). This comment may raise some protest from 
those who arc accustomed to say that an instrument is 
reliable if it measures reliably that which it does 
measure. Let us, therefore, examine the sense in which 
reliability coefficients are actually used. Thus, sup- 
pose that with an educational test there have been found 
high reliability coefficients from the correlation of odd 
vs. even items, but low correlations of test and retest. 
Suppose that it is then further discovered that tliis con- 
trast results from the fact that the interpretation of in- 
structions vitally affects the scores on the test and that 
this interpretation is liable to vary with the same person 
from time to time, and also that the interpretation of 
instructions varies from person to person. It does not 
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matter then that the test on' each occasion "measures 
accurately that which it does measure," the test is con- 
demned because it does not measure the same thing at 
different times or in different individuals. Exactly the 
same situation prevails in maze experiments, as we can 
conclude now not only from general theoretical argu- 
ments, but also from the general weight of the experi- 
mental data referred to above. 

7. The reliability of a maze experiment is a func- 
tion not merely of the maze pattern, but also of many 
different aspects of the maze procedure, such as the pre- 
liminary training, motivation, etc., even when these 
various aspects are carefully controlled. I derive this 
conclusion from the comparison of my reliability co- 
efficients with those secured by Stone and Nyswander 
(24) and Heron (6) . I have no reason to believe that 
my rat groups were more homogeneous than theirs or 
that my maze was easier to learn because of the 4" 
greater length of the alley units. The possibly signifi- 
cant differences which might account for the lower re- 
liability coefficients of the present study must, therefore, 
be either the use of the sound-deadening room or the 
difference in preliminary training, or perhaps the se- 
curing of a much more nearly optimum feeding pro- 
gram than was used by Stone and Nyswander, despite 
the fact that they felt they had secured an optimum pro- 
cedure. It has been suggested to me by Dr. Nyswander 
that the room may have served to lower the reliability 
coefficients by producing some degree of emotional dis- 
turbance rather than to eliminate distractions. Such 
an interpretation would not seem warranted, however, 
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because of the fact that my learning curves are so much 
lower than those of Stone and Nyswander. Also, I 
understand that the rats of Stone and Nyswander were 
run under quite quiet conditions. It may be, therefore, 
that merely the difference in preliminary training pro- 
duced the difference in reliability coefficients. 

Finally, attention should be called again to the very 
great ^desirability of using the split-litter technique 
wherever group comparisons arc to be drawn. I base 
this conclusion upon the slight differences between the 
learning curves and reliability coefficients of Groups 
l.iSc.B and 1.1.0' (which were run by the split- 
litter technique) as contrasted with the greater dif- 
ferences between other groups supposedly paired in 
conditions, such as Groups LiSc.B' and ILl.Sc.D, and 
Groups L]3 .jS'c.C and l.U.Sc.E. This is the conclu- 
sion also reached by Corey (2) in his experiments on 
the effects of muscular activity on the rate of learning. 
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LA CONSTANCE ET LA VAUDlTfi DES EXPERIENCES AVEC LE 
LABYRINTHE CHEZ LES RATS BLANCS 

(R^sum^) 

Cette iTioiiogrnpliic conticiit (1) unc analyse didoriquc dcs probl^mea clc 
Iq Constance ct de In validity comme lea pr6Beiiic Ics experiences nvee le 
labyrifithc ct (2) im rapport d’une cnquetc expenmenfaic sur rinfliicfice tie 
certains facieiirs aur In coiistancc ties experiences avec le InbyriTUhc 7'- 
multiplc. Lea svijcta Lh^orkjiies cliscut^s aojiL; In aigninaiicc clc diversce 
meaures tics differences entre lea groups, surtouL it lYgnrd de U formulc 
recemment propos^e par Tryon; In vnlcur Uc In midianc dana lea donndcs 
dtt ifibyrmthe cn vtic dc Ptlcart dc CclJca donn^esi; Iq valcur reiative dc 
dilf^rcntes mdthodeg dc colculcr la conaiance dcs experiences avee le 
labyrintlic; et {'influence dc la variniion dc la CDpiicil6 aur les cocRicients 
de Constance obteuus. 

Dan^ I'enquctc exp^rimcntnlcj Tapparcil et la technique out ressembU 

i ceux dc Stone et de Nyswander. Cependnnt, an n fait aubir Ics dpreuves 
prcliminairca aur tin parcours droit plutdt qiic aur unc boite d'^chappemenL 

ii plate-forme, ct Ton a’est aervi d’une chombre ap£cln1cmcnt conatruitc pour 
protiger le parcours droit ct le lobyrlnlhc contre Ics aona dc I'exliricur, 
Le nombre dea portca empioyics pour cmpccher le retracemenc ont varii 
avec divers grotipes; an a (esti dcujc groupea nvec une porte dc retrace' 
ment, deux groupcs avee 4 porles dc retracement, et troia groupes avec 13 
porlea. 

Les courbes d'apprciitissnge aopt bcqucoiip motna ilgvicg que celica dc 
Stone et de Nyswandcr. Avee lo plupart dcs groupcs lea erreura moyennea 
ont M moi/ifl ilevics pour la dixlirne ipretrve <j«e Ics erreurs moyennes dcs 
groupCB de Stone ct dc Nyswander h In trcizi&me ipreuve. Pnasiblcment 
6 cause de ceci, les coefHcients de consinncc trouvis ont un peu moina 
ilevia que lea Icura. 

Ln varinbiiiti des risultnls n vnrii approxlmativcmcnt en proportion dea 
erreurs /royennes, Jes grotrpes avec Ics couthes d*apprenthsage ha moha 
ileviea nyont la plus petite varinbilitd. Dnna toutes les ipreuves aauf lea 
premiirea lea risuUata son! icnrtis vera les plus graiides valcura, I'icart 
devenant plus grand nvec la contmiintion de Pentratnementi On inontrc 
que cet icart indique Ic beaoln d'nn labyrinlhe plus Long et plus compiexc 
pour obteiiir lea Constances maxima. 

Les corrilfltions du poiirccntagc de In riduclion du poida ovee Ics erreurs 
Bont ai peu ilevies qu'ellcB indlquent que Ic conlr61e de I'niimentatlon n 
ite sufTiaamment exacts 

Les corrilntions dea erreurs de la premiere ipreuve avec les erreurs des 
ipreuvea aubsiquentca, chez deux dcs groupca cmployant unc porte, in- 
diquent que les riauUata de In premiere ipreuve nc peuvent fitre negligia 
0 ^ le retracement n'est pas empechi, Comme si leur influence sur I'apprentig- 
sQge itait constnnte. 

La Gorrdlntion des erreurs dftna diffdrents groupca d’dpreuvea du mSme 
labyrinthc indique que les constnnccs lea plus dievcca sont obtenuea par le 
labyrinthc A 13 portea et les conatancea les moms dlevdcs par le Inbyrlnlhc 
A une porte. Chez les groiipes cournnt nvec divers degrds tie rdducllon du 
poids, une motivation forte pnrnU plus fovornbic A In constance qu'une 
motivation faiblc. 

Les corrdladona entre les erreur dc deux Jabyrinihes T-muhiple avec \m 
dea groupcs employant 13 pnrtcs mdiqucnt qu'on pent obtenir unc conBinnce 
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Qaaex avcc Ic labyrinth 'r- multi pic, Lcs Cflsaiis d^iipprcniisange 

parnlsacnt un type de riaultflt molns consLnOt que lea crreura Lotallcs. 

LBBI'EK 


ZUVERLASSIGKEIT UNO GOLTIGKKIT IJEt LAflYRlNTHUNTER- 
SCJCHUNGEM (MAZE EXPERIMENTS) AN WEISSEN RAl^TEN 

(Rcfcrnt) 

Diese Monographifl ciuliHlt (1) cSnc thcorclUchc Anfliyae der Problemc 
dcr Zuvcrlttssigkcit und der GalLigkdt ( validity )i vvie aic sich in Laby- 
rinihunterauchun^cn dnrstcllen, und (2) cincn Bcrlchtiiber cine cxpcrimcui- 
elle Unlersuchiing dea Einfluasca gcwisaer Gcgcnaiandc nuf di ZuvcrlHsaipr- 
kcU bei EKpctimenten mlt dcm muUiplcn T-farmigcn Eabyrlnth (multiple 
T maze). Pic Lesproclienen thcorctlsclien Frngcn sind ; die DedeuiunK ver- 
schicdener Mnsastabc dcr Gnippenunicracliiedc, licaondcra mil Bezug nuf 
die nciiikh von Tryon vorgcachlagenc Formcl; dcr Wcrl cics MiUeLvvertca 
(median) bci Lobyrinthbefunden, unlcr EdwHgung dcr yerzerrten Illlufung 
(akewneaa) aolehcr (stall at'iBchcr) Befundc; der relative Wert dcr ver- 
Bcbudcncn Method cn ^\\t ErmUldung dcr ZuvcrlAsslgkcit von LabyrlnlU- 
experimenten; und, acbliesslich, die Einwirkiing dcr Erstreckiing (range) 
der Ftihigkeitazahien (nbtllly scores) au( die ermlitclten Zuvcrlilsslgkciu- 
koeflizienten. 

Dna bfil der experiment ellen Untcrawcbving Rcbraucbtc Apparnt und Ver- 
fohren glclchte dem von Stone und Nysv^antlcr gcbrnuclitcn. Die Vorver- 
suchc Avurden aher nicht aiif einer Schnebte! mil FjuchpUUforjn (platform 
escape box), sondeni aiif einer geraden Strecke (Blrnighiaway) gcinncht, und 
man gcbraiichte cine besondcra gclmuie Ka miner (room) um die gerade 
StTccke \md das Lnbyrimb von Ausseren Gcrlluacben xu scKvUv.cn. Die 
Zahl dcr zur VcrhUiung dcs Zuriickgelicns (retracing) benul7ten Tilrcn war 
bel den verachledcnen Gruppen verschieden, Dei dcr PrOfung von zwei 
Gnippen wurde nur eine cinzige Wicdcrholungaldr (retracing door) 
gebrnuchr, bel zwei weitcren Gnippen wnrden vier Wicdcrholungaluren, nnd 
drei Gruppen drelzchn Tiiren gebrouebt. 

Die Lernkiirvcn aind bedcutend nicdrlgcr nla die von Stone nnti Nya- 
wander crhnUcneii. Bci den meisten Gnippen wnr znlil die der ii\itilcren 
Febler am zchnien Versuch achon idedrigcr ala bci cicn Veraueben von 
Stone und Nyswnndcr am dreissigaten Verauch, Viclleicht dcahalb wnren 
die crbnltcnen ZnvcrUsslgkcitakoeffizlcmen ctwns niedriger ala die ihrigen. 
Die VerftncJerliclikclt (vnrialil lity) dcr Znhicn (scores) verhicit sicli im 
Allgememen proportfonni zu den miltlcrcn Fchlerznhlen, wobei die Grup- 
pen mlt nicdri^er'Stehemlcn Lernkiuvcn die gcringcre Vcramlcrllcbkcit 
erwi'esen. In nllcn Versiiicbcn mil Aiisiiahine dcr allcrcrsten zeigen die 
Zabkn elne VeracViiebnng (akc-wncas) nnch dcr RlchUing dcr hbheren Werte 
bin — and dicse Verscbicbjing wurdc im Lnuf dcr Drcasicning nuagcsproch- 
cner* Es wird daraiif h Engewicaen, dafis dicse Verachiebung die Ndtigkeit 
eines IHngcren und yici komplizicrlercn Lnbyrinthca Tiiir Ertcichnng mnxl- 
mtiler ZuverlUsalgkcilszablen andcutet. 

Die Nlcdrlgkelt dcr Korreintionen zwiaclicn dein ProyieniHntz doa Ctc- 
wicKtvcrbiatca und den Fehlcrn weiacn darauf bin, dnaa die KoiUrolIc dcr 
FiUtcruiig ficnOgend gennu war. 

Korrclntionen zwischen den Fchlern bci dem crat^*n Voraiirh und den 
Felilcrn an wicteren Versueben nii zwei dcr ciivtOrigcn Gruppen wcisen 
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In, (lasB wo die Wic((cThdfung diclK vcrmfcdcit wird, die Rcaiillntc 
n Vcrsuches nichi nbgciegi wcrdcii k6nnen aIb oh ihre Einwirkung 
LrCrnen cine konnianLc aci. 

jrrelflfron irriier dc<? Fchlern hej v'crschScdcncfi tiruppen dcr Ver- 
dem Bclben Lnbyniilh welat dnrauf liln, dnaa dSe lickhtiten Zuver- 
sznlilcn hcj dcr l3-TiiriRcJi Ahordnung imd die nicdrlRslen 
iJgkcjicn hoi dor cln-TOriKc-ii Anordnun^ cr/iolt wo/den. 
isukaLc hei Cyruppcii, in dciicn vcrschicdcnc Cirradc dea Clcwichla- 
l)ca(andcfi| Bclicificii n n^tndcufen, daBu Rcarkcr Anirlcli (molivALion) 
rlliBsigkeit chcr licgiinsiiRi ah Nchwncher, 

iiioncn zwhclicn den Pvlilcrn an v.wci mnliiplcn T-fornngen Lnhy- 
ei cine do#- 1 Gruppen ivchcn darauf hifi, dnss mil dem 

T-formigen Labyrinth einc %icinlich hefrlcdigonde 2CuvorlilJ9JgkciL 
verden Icnnn. AIb Dnsis dcr Eorocliiiung scheinl die Zahl der zum 
iltigen Versuchc wcnijgcr zuvcriflasig xu bcjii oIb die GcBnmtzahl dcr 
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INTRODUCTION 

This monograph presents a comparative and critical 
study of two of the most elaborate of recent attempts in 
the compilation of lists of recommended books for chil- 
dren; The Winneika Graded Book List (8) and A 
Guide to Literature for Character Training (4, 5). 
Each purports to be the last word in the application 
of scientific methods, yet each travels its own sweet 
way in producing its list. Do the lists agree? If not, 
why hot? 

Two undertakings more opposed in basic assump- 
tions could hardly be discovered. The Winnetka 
Graded Book List is founded upon the child, the books 
he reads, and likes or dislikes, and what he thinks of 
them. Only incidentally and in negative fashion do 
adults tamper with what the child likes. The volumes 
of A Guide to Literature for Character Training are 
founded on adult opinion, on what the adult thinks is 
good for or interesting to the child, and are offered to 
adults as a finding-list of best books. Not even in a 
negative or supplementary fashion has the child a di- 
rect voice in determining what is recommended for 
him. 

The contrasting assumptions of the Winneika Graded 
Book List and A Guide to Literature for Character 
Training are but another chapter in the familiar con- 
flict of educational philosophies. Shall the school be 
child-centered or adult-centered? Shall the curricu- 
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lum serve the child’s immediate needs or shall it pre- 
pare the child for adult needs? Shall adults prepare 
the mental food of children according to adults’ tastes 
or according to children’s tastes? The purpose of this 
study is to inquire in what respects these contrasting 
assumptions give the same or different results. 

The fundamental differences in the procedures of 
the two undertakings need to be clearly understood. 
The significant points in the Winnetka study are the 
following: 

1. The primary data consist of over 100,000 ballots 
from 36,000 children. Each child filled out a ballot 
for every book read during the school year. The books, 
some 800, which received 25 or more ballots constituted 
the preliminary selection of best books. 

2. Thirteen children's librarians were given this list 
of 800 books and asked to judge their literary merit and 
general worth. One hundred ten books which three- 
fourths of the librarians rated as cither “not recom- 
mended because of low literary merit” or as “not 
recommended because of subject matter” were ex- 
cluded. 

3. The remaining 686 books are listed in order ac- 
cording to a popularity index which consisted of the 
product of the number of cities in which the book was 
read multiplied by the number of children who read 
and liked it. Supplementary to this order of listing, 
the books which were approved by three-fourths of the 
librarians were starred. 

4. All children were given the paragraph-meaning 
section of the Stanford Silent-Reading Test. The re- 
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suiting scores were translated into values indicating the 
equivalent reading grade ability in terms of school 
grades. The median of the reading grade ability of all 
children reading and liking a book determined the 
school grade in which it was placed. 

By these procedures the basic list of 800 books was 
determined by the number of children reading each 
book, the listing of the books according to a popularity 
index was determined by children, and the grading of 
each book was determined by the median reading grade 
ability of the children reading it. Only in elimination 
of 1 iO books did competent adult opinion enter into the 
picture. 

When the important features of the volumes of A 
Guide to Literature for Character T'raiuing are con- 
sidered, competent opinion determines every step. 

1. A specilic field is defined, i.e., fairy tale, or 
fiction, or biography; all available lists and catalogues 
are canvassed for suitable titles; and, as far as possible, 
the actual books are assembled in the Institute of Char- 
acter Research at the State University of Iowa. 

2. Each book is carefully read by at least three 
readers of the staff of the Institute and detailed judg- 
ments as to its literary merit and character value, its 
most suitable grade, ethical situation, moral attitudes, 
etc,, are recorded. 

3. About one-half of the books subjected to this de- 
tailed analysis are recommended according to fiA'e 
levels of merit, the non-rccommended materials falling 
into four levels of demerit. 

The data lieiein reported fall naturally into three 
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parts. Chapters II, III, and IV considered agree- 
ments as to placement of books in the several school 
grades. Chapters V and VI consider agreements as to 
recommendations of merit. Chapters VII and VIII 
report tv^o supplementary studies designed to yield 
more precise answers to the major questions which are 
involved. 



II 


AGREEMENTS AND DISAGREEMENTS AS 
TO GRADING OF BOOKS 

■ Do the two lists of recommended reading materials 
agree as to the grading of their books? If not, why 
not? 

Accurate placing of children’s books in the several 
grades involves two quite distinct problems. One prob- 
lem is that of relative grading. Here the concern is to 
test, for example, whether five books, A, B, C, D, and 
E should be read in this order from the earlier to the 
later grades or in some other order, Correlations be- 
tween the Winnetka list and the Guide should deter- 
mine whether they agree as to relative grading. When 
the order A, Bj G, D, and E is esUblishcd there arises a 
second and very different problem, that of absolute 
grading. Should these books go into Grades II, III, 
IV, V, and VI, or into Grades V, VII, VIII, and IX? 
Here the question is whether the lists agree as to the 
average grade placement of their materials. Should 
they go into Grades II, IV, VI, VIII, and X, or into 
Grades IV, V, VI, VII, and VIII? Here the question 
is whether the lists agree as to the range of grades em- 
ployed. 

The PFinnelka Graded Book List and A Guide to 
Literature for Character Training agree reasonably 
well as to relative grade placement. One hundred 
fifty-nine books arc common to the Winnetka study and 
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Volume I of the Guide} The correlation between the 
grading of the two is .665. Since the correlation is in- 
fluenced by the range of grades involved, a more accu- 
rate statement is given by the probable error of esti- 
mate. Predicted from the Winnetka list, this is .82 of 
a grade. That is, if a mathematically best prediction 
of the Guide were made from the Winnetka list, 50% 
of the books would show differences of less than .82 of 
a grade. One hundred seventy-six books are common 
to the Winnetka list and Volume II of the Guide} 
Here the correlation is .701 and the probable error 
estimate .84 of a grade. Combining the 159 and 176 
cases gives a correlation of .81 2 with a probable error of 
estimate of .83 of a grade. 

It is of some interest to know whether these figures 
' hold true for comparisons of the Guide and of the Win- 
netka list with other lists based on competent opinion. 
For Volume I of the Guide, 346 books were found 
graded in one. or more of 29 graded lists. Correlating 
the grading of the Guide against the average grading 
of these lists gave a'correlation of .970. Table I dis- 
plays an abbreviation of the scatter-diagram for the 
purpose of illustrating what is involved in well-nigh 
perfect agreement. Here the probable error of esti- 
mate of the Guide is .35 of a grade. These figures in- 
dicate that a combination of other graded lists must 
correlate with the Winnetka list about as does the 
Guide, The Winnetka grading has been checked 

^Actually four lists are Involved, since 110 books I’cjccted from the 
Winnetka list and 399 wliiclj failed of recommendation in tl)c Guide 
arc employed. 

^Only recommended books arc employed here. 



TWO BOOK LISTS FOR CIIILDREH 


259 


TABLE 1 

SCATTERDIAORAM SHOWING CORRELATION BETWEEN GbADING OF 

Volume I of Guide and Twenty-nine Graded Lists 

A correlation of .970 was calculated from finer grouping. 
Italicized figures indicate perfect agreement. 
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against three other graded lists. For 366 books corri- 
mon to the Winnetka list and the Wisconsin list (6), 
the correlation is .827 and the probable error of esti- 
mate from the Winnetka .88 of a grade. One hundred 
ninety-three books were found in both tlie Winnetka 
and Pittsburg (1) lists. Here the correlation is .820, 
the probable error of estimate from the Winnetka be- 
ing .62 of a grade. Two hundred sixty-six books were 
found in the Winnetka and N.E.A. (2) lists. Here 
the correlation is .830, the probable error of estimate 
from the Winnetka being .75 of a grade. It is con- 
cluded from these correlations that the relative grad- 
ing of the Guide and of the Winnetka list agrees 
reasonably well with other graded lists based on com- 
petent opinion. 

• Relative grading, however, is only one aspect of the 
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situation. The same correlations would be obtained i^ 
all the books in the Winnetka list or in the lists pre- 
pared by competent opinion were systematically low- 
ered or raised any number of grades. The discrepancies 
appear in absolute grading. Here, for example, is the 
distribution of lii9 books common to the Winnetka list 
and Volume I of the Guide, 


Grades 

1 

II 

III 

IV 

V 

VI 

VII VIII 

IX 

Mean* 

J.D. 

Wlnnclka 

0 

0 

3 

37 

72 

U 





.81S 

Volume I 

8 

Id 

24 

41 

43 

u 





1.622 


*Me3fia calculateJ aasiming fhiit books in Gxfltlc III, for exnrnple, cover 
the range from Grade 3.00 to Grude 3.99. 


The important facts to note are the following. The 
average grade according to the Winnetka list is S.S7 
while according to the Giude the average is 4.73, or 
almost a grade lower. The standard deviation, a meas- 
ure of the range of grades, is .815 according to the Win- 
netka list and according to the Guide 1.622, or twice 
as much. Of the 159 books, 153 are concentrated in 
Grades IV, V, and VI by the Winnetka, while the 
Guide locates only 98 in these grades. Essentially the 
same contrast is given by the 176 books common to the 
Winnetka list and Volume II of the Guide’. 


Grades: 

II 

lU 

IV 

Y 

VI 

Va VIII 

IX 

X 

Mean 

S.D. 

Winnetka 

0 

0 

2 

12 

24 

78 

47 

13 

0 

7.fil 

1.031 

Volume II 

1 

7 

9 

24 

32 

44 

31 

17 

11 

7.26 

1.743 


The discrepancies are not so marked but are of the 
same type. The Winnetka average is higher and the 
standard deviation much smaller. Of the 176 books, 







TWO BOOK. LISTS FOR CHILDREN 


261 


149 are placed in Grades VI, VII, and VIII by the 
Winnetka, while the Guide places only 107 in these 
grades. 

Comparing the Winnetka with the Wisconsin, Pitts- 
burgh, and N.E.A. lists,® we have the following figures 
showing precisely the same tendencies: 


Grades: 

I 

II 

in 

IV 

V 

VI VII vm IX 

X Total 

Mean 

S.D, 

Winnetka : 

0 

0 

2 

39 

72 

56 

58 

32 

9 

0 

268 

6.47 

1.39 

N.E.A. 

9 

19 

27 

30 

58 

39 

41 

36 

9 

0 

268 

S.83 

2.02 

Winnetka 

0 

0 

2 

21 

42 

45 

SI 

25 

7 

0 

193 

6.67 

1.35 

Pittsburgh 

2 

a 

18 

30 

45 

42 

33 

IS 

0 

0 

193 

5.78 

1.60 

Winnetka 

0 

0 

8 

40 

90 

72 

80 

59 

17 

0 

366 

6.55 

1.46 

Wisconsin 

29 

IS 

2+ 

46 

67 

44 

52 

37 

40 

12 

366 

6.11 

2.41 


A general tendency appearing in these data is that 
books which competent opinion places in Grades I, II, 
and III are located one or two grades higher by the 
Winnetka, while books which competent opinion places 
in Grades IX and X tend to be located a grade lower 
by the Winnetka. Nor does this tell the whole story, 
since the lists prepared by competent opinion on which 
we have reported are for the elementary grades. If 
lists for the high school are added, it appears that books 
placed in Grades IX, X, XI, and XII by competent 
opinion are on the average located in Grades VII and 
VIII by the Winnetka. 

Table 2 in abbreviated form (compare with Table 
1) gives the combined data showing agreements as to 
relative and absolute grading between the Winnetka 

*In prepnrinK ihc Wiacongin, Plttaburgh, nml N.E.A. dietribullon, a book 
recommended for Gradca ill to V for oxomple, is called a fourth-grade 
book, etc. 
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TABLi: 2 

SCATTHkDIAORAM SHOWING AqRBBMHNT BETWEBN TUB 

Wimnbtka List and Three Lists Represunting 
Competent Opinion 

ItnlicizcJ figures in<Hcntc perfect agreement. 


Competent; opinion 

I H III IV V VI VH VIII IX X 


IX 




i 




8 

10 

11 

VIII 





2 

8 

35 

32 

22 

8 

vn 



1 

2 

24 

37 


21 

24 

3 

Vl 



a 

18 

61 

3 i 

IC 

8 

3 


V 

10 

8 

33 

63 

¥2 

11 

5 

1 


1 

IV 

20 

20 

18 

12 

S 

3 

1 





III 7 4 

II 
1 


and the two volumes of the Guide and the Wisconsin 
list. The Wisconsin list is selected because it is more 
recent and a larger number of books are involved. In 
all, 175 of the 701 books (some of which are dupli- 
cated) show perfect agreement, 312 show a difference 
of one grade, 158 show a difference of two grades, 43 
books are displaced three grades, eleven are displaced 
five grades. The average difference is 1,16 grades. 
This is 47% reduction from a purely chance arrange- 
ment. 

What must happen in order to adjust the absolute 
grading of the Winnetka to conform with competent 
opinion? The fourth-, fifth-, and sixth-grade books 
must be dropped one whole grade and the third-grade 
books dropped two whole grades. When this is done, 
the average discrepancy falls to .97 of a grade or a 59% 
reduction from purely chance arrangement. What 
must happen to adjust the grading of competent opinion 




TWO BOOK. LISTS FOR CHILDREN 


263 


inform to the Winnetka? The first-, second-, and 
1-grade books must be moved up to Grade IV, and 
linth- and tenth-grade books placed in Grade VIII. 
e the average difference falls to .61 of a grade or a 
3 improvement over chance arrangement. In the 
!ess of altering the absolute grading, we have 
:hed neither the relative gradings of the lists nor 
r relative agreements. The gains in agreements are 
entirely to changing the absolute grading. So far 
ipetent opinion and data from children do not give 
same results. How are the discrepancies to be ex- 
ned? This is the problem of the next two chapters. 



Ill 

EXPLANATION OF DISCREPANCIES IN 
ABSOLUTE GRADING: TFIE DISTRI- 
BUTION OF BALLOTS 

la the preceding chapter, evidence was presented 
showing that the relative grading of the Winnelka 
Graded Book List agrees reasonably well with A Guide 
to Literature for Character Training and that both 
agree with other lists based on competent opinion. 
Large disagreements, however, are found in absolute 
grading. In explanation of these discrepancies, the 
following pages point out two disturbing factors in the 
Winnetka study, one operating in the collection of the 
original data and another in its statistical treatment. 
Corrections for these factors have been confined to the 
159 books common to the Winnetka list and Volume I 
of the Guide and their rejected lists. The corrected 
Winnetka grades which result are in almost perfect 
agreement with the grading of the Guide. 

We shall consider in this chapter the failure of the 
Winnetka list to obtain an adequate sampling of ballots 
from children in the lower and higher grades. In 
searching for possible explanations of the large dis- 
crepancies between the grading of the Guide and the 
Winnetka list, the data of Table 3 seemed important. 
Table 3 quotes a statement and a table from page 14 
of the introduction to the Winnetka Graded Book Lisi.^ 

■‘Original edition. 

[ 264 ] 
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It purports to display the number of ballots received 
from each grade of reading ability. According to the 

TABLE 3 

Tadle and Accompanyino Statement Quoted prom Page 14 
OF THE H^inuelka Graded Book List Purportino to Give tub 
Number op Ballots Rbceivbd from Children of Each Degree 
op Reading Ability 

Number of ballots. About 100,000 filled-out ballots 
were returned to us. Half of these (53,228 to be exact) 
were on 796 books, on each of which there were at least 
25 ballots. The other half were scattered over 8,500 
books, none of which had ns many as 25 ballots. Of the 
53,228 ballots on books rc.aci by 25 or more children, 22,184 
were from boys and 31,044 from girls. These were dis- 
tributed according to the reading grade of the children 
as follo^v5: 


TABLE NO, I 

Boys Girls Totals 

Ird 243 322 56S 

4lh 2,972 4,186 7,1S7 

lih 6,106 8,267 14,283 

4tti 3,775 fi,867 10,642 

7th 5,629 7,359 12,988 

8lh 2,978 3,509 6,487 

9lll 446 390 836 

10th 125 145 270 


22,184 31,044 53,228 

quoted table, no ballots were received from second- 
graders, only 565 ballots were received from third- 
graders, while fourteen, ten, and thirteen thousand 
ballots were received from fifth-, sixth-, and seventh- 
graders. These data seemed to explain the whole situ- 
ation. Obviously, if no ballots were received from 
second-graders and only a very few from third-graders, 
there was no chance of any book being placed in the 
second grade and very little chance of any book being 
placed in the third grade. 

This explanation, however, had to be discarded. Up- 
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on inquiry, Mr, Washburne wrote that the explanation 
of the table had been misstated, The table gave the 
number of ballots on books finally classified in the given 
grades. That is, there were 565 ballots returned on 
books finally classified in the third grade, and so forth. 

The request was then made for the distribution by 
grades of the ballots. Mr. Washburne graciously con- 
sented to make the tabulation. The data given in Table 
4 with the following explanation were received from 
Miss Vogel- The italics are hers. 

■'*The enclosed sheet entitled Number of bnlloiT on graded 
hooh (sec Tfiblc 4) gives the total number of bnllots re- 
ceived on all books included in the graded book list and the 
excluded list. Read the table as follows: There were 
1040 ballots received from children with second grade 
reading ability. This is h9% of the total number of 
ballots received (54»791). These 1040 ballots were on 256 
different books, thus making the number of bnllots received 
jier book by second grade children, 4.1, Similarly, the 
the number of ballots per book filled out by third grade 
children was 6,2, by fourth grade children, 12.2, by fifth 
grade children, 13.2, etc. Because of the larger number 
of books reported on by children in the middle grades this 
measure is, of course, n much better one than tlie total 
number of ballots. The (au columns entitled Per cent of 
ballots per book gives the proportion of ballots per book re- 
ceived from children of each reading grade, 

^'Mr. Washburne and I talked the matter over with 
Professor - - - - of the University of Chicago. He ad- 
vised us as follows: 'It seems to me that there should be 
a correction made for the varying percentage of ballots per 
book. You could do it on the books for third and fourtli 
grades and sec what difference It makes, I suspect the 
grade changes will be feWs I should take 3 ;is a multi- 
plier for ballots in grade two, and 2 as the factor for grade 
three, leaving the rest up to grade nine unchangech TJius, 
if you found on a book lOO ballots for grade two, 400 
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TABLE 4 

Numdijr of Ballots, Numder of Books, and Number of 
Ballots pur Book According to School Grade for 789 Cases 
OF Books Rhcbivinc Twenty-five of More Ballots 


Reading 

grade 

No. of 

% o( 
Itdllotd 

No. of 
book* 

No. of 
bnllolB 
per book 

% of 
bnllolB 
per book 

II 

1040 

1.9 

2S6 

‘ 4.1 

4.8 

III 

3043 

5.6 

491 

6.2 

7.4 

TV 

3491 

15.5 

699 

12.2 

14.5 

V 

10044 

18.3 


13.2 

15.7 

VI 

9103 

16.6 

766 

11,9 

14.1 

vir 

8S37 

16.1 

733 

12.1 

14.4 

via 

722S 

13.2 

687 

10.5 

12.5 

IX 

409^ 

7.5 

576 

7.1 

8.3 

X 

1961 

3.6 

467 

4.2 

5.0 

XI 

951 

1.7 

389 

2.8 

3.3 

Toinl 

54,791 






ballots for (Oracle three, niul 100 for (rrncic four, tlic chanire 
would kIvc tlic followi'iic distribution: 300 for grade two, 

800 for grade three, and 100 for grade four. 1‘his book 
would still be in grade three.’ ” 

Table 4 shows very clearly that a relatively large 
number of ballots were obtained from Grades IV to 
VIII and a relatively small number from Grades II, 
III, IX, X and XI. The effect of this uneven distri- 
bution of ballots is to pull the preferred grade of all 
books toward the middle grades. What correction 
shall be applied to eliminate this factor? 

The above letter suggests that the correction be made 
on the basis of the varying percentages of ballots per 
book. Specifically, it recommends that the second-grade 
ballots be multiplied by factor 3 , and the third-grade 
ballots by factor 2 , and presumably that the ninth-, 
tenth-, and eleventh-grade ballots he multiplied by fac- 
tors of approximately 2 , 3 , and 4 . 5 . With the basis for 
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this correction the author disagrees. The specific cor- 
rections are not so seriously in error. They go in the 
right direction, but do not go far enough. The proper 
correction should be based on the percentage of ballots 
from each grade of reading ability. That is, instead 
of multiplying the number of ballots by the following 
factors based on the percentage of ballots per book as 
suggested : 

Grades: II III IV V VI VH VIII IX X XI 

Factors 3.2 2,1 1.1 1.0 l.I 1.1 1.3 1.9 3.1 4.8 

the proper correction should employ the following fac- 
tors: 

Grades: II III IV V VI VIIVIII IX X XI 

Factors: 9.7 3.3 1.2 i.O 1.1 1.1 1.4 2.5 5.1 10.5 

For Grades II and XI the proper correction is two or 
three times as severe. 

The suggested correction was adopted upon an in- 
adequate analysis of the situation. Evidently the rea- 
soning employed was as follows. If the second-grade 
ballots arc multiplied by factor P.7 then the number 
of ballots per book read by second-graders would be 
39.6, a number all out of proportion to the number of 
ballots per book read by fifth-, sixth-, and seventh- 
graders, where the similar numbers are 13,2, 11.9, and 
12.1. The analysis should have proceeded a step fur- 
ther. If on/y books receiving at least 25 ballots origin- 
ally are considered and factor Q.7 applied, then the 
minimum number of all the ballots for books read by 
second-graders would be SS.7, while the minimum 
number of all the ballots on books read by fifth- and 
sixth-graders would continue to be 25. Obviously, un- 
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der these conditions the average number of ballots per 
book from second-graders must be much higher than 
for fifth- and sixth-grade children. 

The assumption that the number of ballots per book 
must be the same for all the grades holds, if it holds at 
all, only when wc consider multiplying all ballots by 
corrective factors. If this were done, then several hun- 
dred books which originally did not achieve 25 ballots 
would be candidates for inclusion in the list and the 
number of ballots per book might prove to be similar 
from grade to grade, 

There is, however, a much more simple and direct 
way of evaluating the merits of the alternative correc- 
tions, So far the question has been, "What would have 
happened if 10,044 ballots had been received from each 
grade on books achieving at least 25 ballots?" If this 
question is restated, "What would have happened if 
only 951 ballots had been received from each grade on 
books achieving at least 25 ballots?" then the logic of 
the situation is no longer so involved. The proper cor- 
rection would then be to divide by the following fac- 
tors; 

’ GrBtlcs: II III IV V VI VII VIII IX X XI 

Factors: 1.1 3.2 8.9 10.6 9,6 9.3 7.6 +.3 2.1 1.0 

If these factors arc applied to the distribution of ballots 
for a given book and the average or median or modal 
grade determined, the result will be precisely the same 
as that obtained by multiplying by the more severe 
factors given on page 268. 

There is a dilliculty, however, in using the data of 
Table 4 since it gives the distribution of ballots only on 
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TABLE S 

Numodr of Ballots Accordino to Rbadino Grade for All 
Books Balloted Upon 


Rcadinjf grade 

N 04 ol bnllota 

II 

2010 

m 

6323 

IV 

14881 

V 

17584 

VI 

16452 

VII 

16237 

vm 

13S15 

IX 

8436 

X 

4571 

XI 

2431 

Total 

102741 


books achieving 25 ballots. Accordingly, the request 
went forth for all the 50,000 ballots on books receiving 
less than 25 ballots and again in the finest spirit of co- 
operation the data were made available. "With these 
ballots at hand, the data on a sample of every twelfth 
book were tabulated. Multiplying the results by 12 and 
combining them with the data of Table 4 gives the dis- 
tribution of all ballots according to degrees of reading 
ability. These are displayed in Table 5. This provides 
the data necessary for an improved correction. The 
exact factors are as follows : 

Grades: II III IV V VI VII VIII IX X XI 

Factors: 8.8 2.8 1.2 1.0 1.1 1.1 1.3 2,1 3.8 7.2 

In applying these corrections, in order to avoid cum- 
bersome decimals, the actual factors used were: 

Grades: II III IV V VI VII VIII IX X XI 

Factors: 8.5 2.5 1,0 1,0 1.0 1.0 1.0 2,0 3,5 7.0 

Application of these corrections to the distribution 
of ballots on two books will illustrate their effect. 
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TABLE 6 

Original Distributions and Corrected Distributions for Two 
Books. Illustrating the Effect of Correcting for the 
Original Inadequate Sampling from the Different Grades 

Bouf-wotu and Mew-Mew — Crsiik 


Grndc: 

II 

m IV V 

VI 

Mcdinn grade 

Original disiribiui'on 

iO 

15 2a 

2 


Effect of corrccLion 

85 

S7 28 14 

2 

2.9a 

Capiain Hlood — Sabatinl 

Grade : 

VI 

VII VIII IX X 

XI 

Median grade 

Original distribullon 

I 

4 4 6 7 

6 

9,82 

Effect of corrccliori 

1 

+ 4 12 24 

42 

10.94 


Table 6 presents for two books the original distribution 
of ballots and the distribution obtained from the cor- 
rection. The original distribution of ballots on Bov)- 
•woit) and Mew-mew is 10, 15, 28, H, and 2. The 
median of the reading grades of children reporting on 
this book according to the original distribution is 4.36. 
This drops to 2.98 when the corrections are applied to 
the original distribution. This is a typical result for 
all books which the Winuelka Graded Book List places 
in the second, third, and fourth grades. For Captain 
Blood the opposite effect results from correction. This 
is typical of books originally placed in the eighth, ninth, 
and tenth grades. In the case of Captain Blood, the 
distribution is such as to suggest that if 17,000 ballots 
were obtained from twelfth-graders, its placement 
would be even higher. 

What is the effect of these corrections on the group 
of books common to the Winnetka list and Volume I 
of the Guide? 'Fable 7 presents the data for the 159 
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TABLE 7 

Effects of Corrections for Unbvbk Distribution of Ballots 

ON tub WlNNBTKA MEDIAN GrADES 

Vnr/ttblcs 

1. Median grades calculated fromoriginnl Winncika distributions, 

2. Median grades calculated from corrected Winnetka distri- 
butions, 

3. Grading of Volume I of Guide and its reicctcd list. 




DiRiribuiion of Looks by grade 



Mean S.p, 

Variables 

I 

a HI IV V 

VI 

vn vniix 

X 

XI 



1 

0 

0 3 37 72 

44 

2 I 0 

0 

0 

5.57 

.815 

5 

0 

13 31 33 

28 

2 1 

0 

0 


1.518 

3 

g 

16 24 41 43 

1+ 

9 2 2 

0 

0 

4.73 

1.622 


Correlations of grading of Guide and rejected list with 
Winnetka original medians .665 

Winnetka medians corrected .652 


books which were found to be common to the two lists." 
The Guide and its rejected list place eight books in the 
first grade, sixteen in the second, and so on. Their 
mean grade on the 1S9 books is 4.73, with a standard 
deviation of 1.622. The original Winnetka medians 
yield a mean grade of 5.57, with a standard deviation 
of .815. The discrepancies are markedly lessened by 
the corrections. The correction lowers the mean of the 
Winnetka medians ,29 grades and increases the stand- 
ard deviation until it closely agrees with the Guide. 
On the whole, the correction lowers the Winnetka grad- 
ing on the 159 books because these books fall for the 
most part in the lower grades. Although correction 
for the uneven distribution of ballots results in the 
marked shifting of the Winnetka grading, it is to be 
observed from the correlations that the relative order 
of the books is not disturbed. 


®In reality, four lists, since grades arc available for books which 
both lists fnil to recommend, 




IV 


explanation of DISCREPANCIES; 

MODE VERSUS MEDIAN 

It has just been demonstrated that one of the reasons 
for the discrepancies between the absolute grading of 
the Guide and the Winnetka list is the failure of the 
Winnetka list to obtain an adequate sampling of bal- 
lots from all the grades of reading ability. Correction 
was made for this factor. The corrected Winnetka 
medians are in much closer agreement with the Guide. 
We turn now to a second factor explaining these dis- 
crepancies: the statistical treatment of the data. 

TABLE 8 

DiSTRinoTiOK or IUllots Ai'tkr Application of Corructions 
FOR Tiirbk Books Showing Moor iiy Inspjiction 
IN THE TiIIRO GrADR 


BooV Dfsiribiition of b&llou by grade Median 

number II lU IV V VI VII VIII IX 


1 

25 

60 

39 

37 

26 

5 

2 


4.18 

2 

3+ 

A7 

37 

29 

15 

6 

3 

4 

4.18 

3 

8 


14 

13 

8 

2 



4,32 


Table 8 presents the distribution (after applying cor- 
rections for inadequate sampling of ballots) of a sam- 
ple of tluec books for which a definite mode by 
inspection appears in the third grade. For book Num- 
ber 1 there are 25 ballots from children of second- 
grade reading ability, 00 ballots from third-graders, 
and .39, 37, 26, 5, and 2 ballots from fourth-, fifth-, 
sixth-, seventh-, and cighlh-graclcrs. This book is most 
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widely read and apparently best liked in the third 
grade. What statistical measure shall be applied to 
determine the exact best grade of this book? The IV in- 

TABLE 9 

Distribution of Ballots After Application of Corrections 
FOR TiiRiiB Hooks Showing Mode by Inspection 
IN THE Second Grade 


Book Dialribution of ballolB by gride Median 

number H III IV V VI VII VIH IX 


1 

119 

53 


2+ 

7 

1 

1 

2 

3.10 


ts 

12 

11 

6 

2 




3.21 

3 

U 

27 

12 

2 





3.13 


Tietka Graded Book List uses the median, which places 
this book in the fourth grade. Similarly, the distri- 
bution of ballots on the second and third books indicate 
that they are most widely read in the third grade, while 
again the median places them in the fourth. Similar 
cases are presented in Table 9 for three books showing 
a mode by inspection in the second grade. Again, the 
median places them too high. 

More extreme cases illustrating the opposite tendency 
of the median are presented in Table 10 for three books 
showing a mode by inspection in the eleventh grade. 

TABLE 10 

Distridution op Ballots After Application of Corrections 
FOR Three Books Showing Mode by Inspection 
in the Eleventh Grade 


Book 

mimber 

Distribution of bnllois by gride 
III IV V VI VII Vm IX 

X 

XI 

Mcdlnn 

1 

1 2 


5 10 11 

12 

17 

35 

10.26 

2 


1 

2+10 

18 

49 

£3 

10.76 

3 

2 

7 

n 22 2+ 

38 

35 

SC 

9v82 
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The medians here place these books in the ninth and 
tenth grades. 

These cases, while not entirely typical, illustrate the 
errors involved in the median. Books most widely read 
in the second, third, and fourth grades tend to be placed 
too high by the median. Hooks most widely read in 
the ninth, tenth, and eleventh grades tend to be placed 
too low by the median. The explanation is simple. 
Books most suitable for the third grade are often read 
by fifth- and sixth- and occasionally by eighth- and 
even ninth-graders. The stringing out of ballots in the 
upper grades is not compensated by a normal distribu- 
tion of ballots below the second grade. This explana- 
tion is even clearer in the case of the three books of 
Table 10. These books are in reality adult books. 
They are often read by eighth-, ninth-, and tenth- 
graders, and occasionally by still younger children. 
The conditions under which the Winnetka list was 
prepared give opportunity for ballots on such books 
from immature readers and no opportunity for adult 
readers. 

It has been indicated that the distributions arc not 
entirely typical. They arc not typical for the reason 
that clcan-cut modes by inspection arc not common, 
fn less than one-third of the distributions (after cor- 
rection) docs any grade have the advantage over an- 
other grade in number of ballots by as much as 10% 
of the total ballots. How, then, determine the grade in 
which a book is most widely read? 'I'hc authors of the 
Winnetka list overlooked the possibility of calculating 
the empirical mode by use of the following formula 
(3, pp. 60-62) : 
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Mode = Mean — 


M enii — Median 
c 


in which factor c is determined from the formula 

~ 1100 .08+6 (Mmm — Median)''^ 

which for all but extremely skewed distributions equals 
approximately .33. Mathematically fitting a curve to 
1 59 distributions was out of the question. Accordingly, 
the above formula, substituting .33 for c, was used to 
determine the mode. 

To illustrate the effect of this second correction tlie 
formula may be applied to the data of Tables 8 and 9. 
For the three books showing a mode by inspection in 
the third grade and medians in the fourth, the empiri- 
cal mode yields the following grades; 3.61, 3.59, and 


TABLE 11 

Effect on Winketka Grading of Correction for the Uneven 
Distribution of Baljx>ts and for the Use of tub 
Median Instead of tub Mode 

Variables 

1. Median grades cnlculnicU from original Winnetka clistribu* 
tions. 

2. Median grades calculated from corrected Winnetka distribu- 
tions. 

3. Modal grades calculated from corrected Winnetka distribu- 
tions. 

4. Grading of Guide and rejected list, 


Vsiiobles Oisiribulion of books by grade Mean*' S.D.* 
I II III IV V VI VllVmiX X XI 



0 

0 

3 

37 

72 

44 

2 

i 

0 

0 

0 

S.57 

.815 

z 

0 

12 

21 

33 

49 

28 

13 

2 

1 

0 

0 

5,28 

1,518 

3 

11 

10 

26 

39 

39 

26 

4 

3 

0 

0 

1 

4,77 

1,606 

4 

B 

H 

24 

41 

43 

14 

9 

2 

3 

0 

0 

4.73 

1.632 


'Winnetka means and atatidard deviaiiona calculated from a Hncr group- 
ing. 
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TABLE 12 

Correlations of Grading op Guide and Rejected List with 
W iNNETKA Medians and Modes from Original and 
Corrected Distridutions 


Corrclaiions of Guide wiih* 
OriRiiml Winneika medians 
Correcieci Wirmeika mediana 
Corrected Wiinieikn modes 

Original Winnetkn medians wlili 
^nal Winnetka modes 


,673 ±.029 
.688±.208 
.672±.029 


. .883±.012 


*Al) correlations calcvilntcd from finer groupings than those presented In 
Tallica 11, 13, 14, and 15. 


3.97. For the three books showing a mode by inspec- 
tion in the second grade and medians in the third, the 
empirical mode yields the following grades: 2.19, 2.60, 
2.89. 

The cfFect of this correction on the grading of the 
IS9 books is presented in Tables 11 and 12. For pur- 
poses of comparison these tables recapitulate the data 
of Table 7. The mean of the grades assigned by the 
Guide (variable 4, Table 1 1) to the IS9 books in 4.73, 
with a standard deviation of 1.622. The mean of the 
original Winnetka medians (variable 1) is 5.S7, with a 
standard deviation of .815. Successive corrections of 
the Winnetka grading gradually eliminate these dis- 
crepancies until almost perfect agreement is reached 
in the final correction. That the mode lowers the final 
mean grade is due to the fact that the 159 books are 
concentrated for the most part in the lower grades. 

It should be pointed out that in a few cases the dis- 
tributions arc so irregular that the mode introduces 
more error than it corrects for. This is probably true 
of the one book which is placed in Grade XI according 



278 


OBNBTIC PSVCUOlrOOY MONOORAPHS 


to the final correction (see variable 3, Table 11). The 
distribution of ballots after correction on this book is 
as follows: 

OraiieT fv V VI Vir" 'vHI IX X 

'mHois! I 2 n 12 A in 2« 

The straggling three eases in Grades IV and V and the 
uneven distribution in Grades VII to X result in a 
mode of 1 1.07. There are, of course, no ballots on this 
book from eleventh graders. This book is placed in 
the fifth grade by the Guide. Although it is tempting 
to smooth the distribution of ballots, this has not been 
done. 

Apparently more serious errors arc involved in the 
eleven books (see variables 3, Table 11) which the 
mode places in the first grade. The sum of the dis- 
tribution of ballots (after correction) on these eleven 
books is as follows : 


Grade: 

11 

III 

IV 

V 

VI 

VII 

1'ijT Tx^ X 

Ballola: 

S3S 

142 

xW 

99 

39 

15 

9 6 4 


Summation of the ballots in this manner smooths the 
distribution and frees the mode from the disturbing in- 
fluences of irregularities in the individual eases. For 
this distribution the median grade is 2,92 and the modal 
grade is 1.62. Again, the mode places books in a grade 
from which no ballots were returned. Whether it is 
sound to call these first-grade books is an open ques- 
tion. Calling them second-grade books changes the 
mean grade of the 159 books by only five-hundredths of 
a grade. We are far enough from the original data. 
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Further refinements would introduce only slight 
changes and it is doubtful whether they would bring 
us closer to the true situation. 

Table 12 presents the corresponding correlations. It 
is to be observed that marked shifting of the means and 
standard deviations of the Winnetka grading does not 
alter the correlations. This is important evidence that 
the corrections applied have not done violence to the 
data. Tables 1.1, 14, and IS display the three most 
important of these correlations. Tables 13 and 14 are 
the scatterdiagrams of the Guide with the original 
Winnetka medians and with the final Winnetka modes. 
According to Table 13 only 37 books are placed in the 
same grade by the Guide and original Winnetka me- 
dians. In Table 14, this number is S3. In Table 13 the 
average clifFercncc between the Guide and the original 
Winnetka medians is 23% better than chance arrange- 
ment. while in Table 14 the average difference between 

TAHhlC 13 

AnhRKVIATlON or SCAITliRUIACRAM SuOWtNO CORRliLATION 
tUiTWIiKN Tllli (iRAI)INO OK THI! Guide AND THU 
f>«f(:iNAI. VVlNNKTKA MeDiANS 


Guide 

I II III IV V VI VII VIII IX X xr 


I 

I 

1 (. 20 J 6 2 1 

1 J M 27 2t 4 2 

t. 1 1 0 S I I I 

I 1 


«'-5 

■Si 


: c 

■a 


XI 

X 

[X 

VIII 

VII 

VI 

V 

IV 

in 

11 

i 
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TABLE 14 

Abbreviation op Scatthrdiagram Showing Corrjilation 
butwebn Grading of tub GnUh and Final 

WiNNETKA MonES 


r 



XI 

X 

IX 

vm 

VII 

VI 

V 

IV 

in 

11 

i 


1 
1 

+ 

2 


(1 III IV 


Ouiiic 

V VI vn VIII IX X XI 


I 1 

4 

I 1 10 5 4 

I 1 3 ;7 5 2 

3 9/^82 
5^9211 

4 Z 

4 3 Z 


the Guide and the final Winnctka modes is 44% better 
than chance arrangement. Table IS illustrates the ten- 
dency of the final mode to lower the grading of books 
originally placed in the lower grades and to raise the 
grading of books originally placed in the higher grades. 


TAHLE 15 

Abbreviation op Scatterdiaoram Showino Corrhution dk- 
TWHBN GraDIMO OF OrIGINAI. WlNNB'fKA MhdIANS AND FiNAL 
WiNNBTKA Modus 


Final Wlnnelba ModcR 

I n III IV V VI VII vin ix x xi 


M 

I- 

X 

o 


XI 

X 

IX 

VUI 

vn 

VI 

V 

IV 

in 

11 

I 


i 

/ 1 

n 26 ^ z 

11 32 26 

15 7 
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It will be worth while in addition to the explanations 
already discussed to point out two additional uncer- 
tainties in the grading of the Winnetka Graded Book 
List. In the first place, 25 original ballots are a small 
number from which to determine the best grade. This 
is readily demonstrated. The correlation between the 
grading of the Guide and the Winnetka list, using the 
finally corrected modes, was determined for books hav- 
ing less than 50 ballots originally and for cases having 
50 or more ballots. The 75 cases showing less than 50 
ballots yield a probable error estimate of .93 of a grade 
from the Winnetka list. The 8+ cases showing SO or 
more ballots yield a probable error of estimate of .68 of 
a grade. That is, the grading of books with less than 
SO ballots contains about 36% more error than the grad- 
ing of books with more than SO ballots. This is a large 
difference. It demonstrates that the relative grading 
of tlic Winnetka list becomes much more accurate 
where 50 or more original ballots were obtained. 

A second cause of uncertainty in the relative grading 
of the Winnetka list is that, when children of all grades 
of reading ability are given the freedom of books of all 
degrees of difficulty, the resulting distribution of ballots 
is often spread over the entire range of grades and is 
occasionally more rectangular than normal. Of the 159 
books, the distribution of ballots spreads over the whole 
10 grades in 18 cases. In 22 cases the range is 9 grades. 
In 34 cases the range is 8 grades. The occasional rec- 
tangular distribution of ballots (after correction) is 
well illustrated by the following extreme cases selected 
from among the 159: 



2B2 


OBMHTiC P8YCUOI.OGV MONOGRAPHS 


Distribution op J^ailots by Grade 



U 

m 

IV 

V 

VI 

VII 

vni 

IX 

X 

XI 

Book No. t 

59 

53 

54 

59 

S3 

25 

16 

H 

3 

n 

Book No. 2 

a 

15 

14 

29 

15 

21 

21 

12 

14 

7 

Book No, 3 

a 

15 

6 

10 

12 

17 

8 

20 

7 

7 

Book No. 4 

17 

22 

10 

12 

10 

8 

3 

(0 

3 

7 


Statistical determination of the best grade of these 
books by any method whatever is certain to result in 
serious errors- 

The conclusion of these three chapters is that dis- 
crepancies in absolute grading are due to inadequate 
sampJing of ballots and to faulty statistical analysis of 
the Winnetka data. When corrections are applied the 
disagreements vanish. Instead of being mutually con- 
tradictory, grading based on competent opinion and 
grading based on children’s choices arc mutually sup- 
porting, Chapter VIII will present further evidence 
on this point. 



V 


JUDGING THK RELATIVK WORTH OF 
BOOKS 

In this anct the foll«>\vlng chapters the center of inter- 
est turns fn)m the ([uestion of agreements concerning 
the grading of varif)us books to the question of their 
merit or general worth. Both undertakings list only 
books of supposedly genuine worth. In the case of 
Volume I of A Guide lo Lileralure for Character 
Traiuinf/, 461 titles arc rccommentled according to five 
degrees of merit, and an additional 399 titles, classified 
in four levels of demerit, failed of recommendation. 
The Winuetka Graded Hook List employs two proce- 
dures for selection of the supposedly best books. The 
first and primary lest for inclusion of a book is that at 
least 25 children must have returned ballots upon it. 
The number of children reading a book and the num- 
ber of cities in which it was read determined the popu- 
larity index and hooks arc listcil according to this index. 
The second and supplementary test of the worth of a 
book was the opinion of 13 expert children’s librarians. 
One hundred ten books which three-fourths of these 
librarians thought unsuitable were excluded from the 
list. One hundred nineteen which three-fourths of the 
librarians thought to be of "unquestionable literary 
merit” were .sinrrcti. Once more we find the two un- 
dertakings attempting to do the same thing by very 
different meiluids. 'I'lie Guide relies exclusively on 
competent opitiion to determine the merits and dc- 

[2n:tJ 
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merits of the books subjected to study. The Winnetka 
list relies primarily upon the choices of children and 
employs competent opinion only in supplementary and 
negative fashion. Again, it is asked, do the lists agree? 
If not, why not? It will be convenient to consider, first, 
the reliability of the judgments of the librarians and 
their agreements with the Guide, while Chapter VI 
will consider the question of agreements between judg- 
ments of worth by adults and children’s choices. 

“Just what is ‘literary merit’ anyhow?’’ With this 
question Carleton Washburne and Mabel Vogel sum 
up the results of their study of the estimates of literary 
worth by 13 expert children’s librarians on the books 
studied in the preparation of the Winnetka Graded 
Book Lisi,^ “The reports of these experts varied ma- 
terially. Out of about 800 titles submitted to them 
there were only about 100 on which they all agreed. . . . 
If a group of children’s librarians, selected as among 
the most expert In the United States, dififer among 
themselves as to what books have high literary merit 
and what ones are trashy, does it not show that none of 
us are able to set up as yet any final and generally ac- 
ceptable standard of literary merit?’” 

The first question at issue is the presence or absence 
of agreements among the librarians. If it should prove 
that their estimates arc reliable, the second question is 
whether they agree with the rankings in the Guide, To 
ansVver these questions there arc available two sets of 
data. Through the courtesy of Mr. Washburne, we 


®Pag:e 44, original edition, 

Tages 43 and 44, original edition. 
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have the complete record of the original estimates of 
the librarians on all books. Through the courtesy of 
four librarians, we have the revised estimates on books 
common to the Winnetka list and those evaluated in 
preparation of the Guide. 

The general procedures employed in the preparation 
of the Winnetka list have been described. It is neces- 
sary here to add only a more detailed explanation of 
how the librarians* judgments were obtained. They 
were submitted an alphabetical list of all books and 
were asked to mark each title 1, 2, 3, or +, indicating 
whether the book was 

(1) of unquestionable literary merit. 

(2) valuable for the list although not of high literary 
merit. 

(3) not recommended because of low literary merit. 

(4) not recommended because of subject matter." 

If a book was unfamiliar, they were to indicate the 
fact with a question mark. These instructions were 
supplemented by sample titles illustrating the above 
definitions. 

Before presenting the data obtained from these in- 
structions, two comments arc pertinent. It will be 
observed that the librarians were asked to judge two 
separate things and to judge them on the same scale. 
Accordingly, the scale of four points is not strictly a 
scale at all, but calls for four unrelated and sometimes 
conflicting judgments. If separate estimates of literary 
merit and of content value were desired, two scales 
should have been provided. Secondly, the instructions 


"Sic pnire 42, oriKi’nnl pdiiiori 
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do not insure that the books will be distributed over a 
reasonable range of steps. 

In spite of the unsatisfactory data, statistical treat- 
ment yields a very high degree of reliability. The 
procedure was to consider only those titles upon which 
at least eight librarians recorded their estimates and to 
correlate the average of half of these estimates with 
the other half. This proved to be ,862— .012 for a ran- 
dom sample of 200 cases. By Brown’s formula the 
estimates of the thirteen librarians should correlate .925 
with a similar group of judges. Further, they should 
correlate .962 with the judgment of a very large num- 
ber of similarly competent librarians. Far from being 
a low reliability, this is most exceptionally high. In- 
deed, the reliability is so very high as to suggest that 
we have here, not 13 independent judgments, but 13 
judgments derived from a common source. 

At the time that an intensive study of the librarians’ 
judgments was undertaken, 1 25 books upon which at 
least eight estimates were available were found com- 
mon to the Winnetka list and those examined in the 
preparation of Volume I of the Guide, 

For these 12S books, the chancc-half reliability of 
the original librarian estimates proved to be .825=^,091. 
By Brown’s formula the reliability of the 13 estimates 
is .905. These figures check the reliabilities presented 
for a random sample of 200 cases. 

The average judgment of the librarians correlated 
against the six levels of merit as given in the Guide and 
the four levels of demerit in a rejected list yields an r 
of .450=5= .048. It will be convenient in comparing this 
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correlation with others to be reported if account is 
taken of the range of merit involved. For this purpose, 
we shall use the standard deviation of the ranks accord- 
ing to the Guide and its rejected list. This is 1.92 
levels of merit. The probable error of estimate is 1.16 
levels. That is, a best estimate of the Guide from the 
original librarians’ estimates would show discrepancies 
with the Guide which would be less than 1.16 levels in 
50% of the cases. Wc may, however, go beyond the 
question of obtained agreements with the Guide and 
pose the theoretical question, “How closely would a 
very large number of librarians agree with a very large 
number of critics such as those employed In the Insti- 
tute of Character Research? Correcting the .450 cor- 
relation for attenuation, the answer is .520. 

The nature of the instructions bj^ which the librarians 
originally judged the books of the Winnetka list led to 
a feeling that considerable agreement between the li- 
brarians and the Guide was being obscured. Accord- 
ingly, an attempt was made to obtain revised estimates. 
A list of the books common to the Winnetka list and 
Volume I of the Guide and their rejected lists was sent 
to the 13 librarians. They were asked to indicate the 
best 10% of the books, the next best the middle 
50%, the next poorest l5/fi, and the poorest 10%. They 
were instructed to use their own good taste as to the 
factors determining the best material and to give what- 
ever weight to irucrest value, literary merit, content, 
or other values as they individually thought desirable. 
As a whole, the group proved uncooperative. Only 
four librarian.s supplied the revised estimates. But 
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these are sufficient to give a distinctly higher reliability 
and a higher correlation with the Guide. 

As in the study of the original estimates, the first 
question is that of reliability. Summing two revised 
estimates and correlating them with the other two yields 
an r of .692=^.034 for I IS cases (10 cases omitted be- 
cause of incomplete data). While this chance-half 
reliability is lower than for the original estimates 
(.825), only four judges are involved instead of thir- 
teen. Correlating the original estimates of two of these 
judges versus the other two gives a chance-half relia- 
bility of only .409±.062 (corrected for errors of group- 
ing) . The revised estimates are, accordingly, distinctly 
more reliable when the number of judgments is con- 
sidered. 

How well do the revised estimates correlate with the 
Guide? That these revised estimates contain new ele- 
ments is indicated by a correlation of only .69 with the 
original estimates. Averaging the four revised esti- 
mates and correlating them with the Guide yields an r 
of .518=t.046 (115 cases). Not only is this correlation 
slightly higher but, with 10 cases omitted, the standard 
deviation is smaller, being 1,61 levels. It follows that 
the probable error estimate is distinctly smaller, being 
.93 instead of 1.16 levels. Again it is of interest to go 
beyond the question of obtained rankings and to pose 
the theoretical question of how well a very large num- 
ber of librarians would agree with a very large number 
of critics such as those employed in the Institute. Cor- 
recting the .518 correlation for attentuation yields an 
rof .631. 



TWO BOOK LISTS POR CHILDREN 


289 


It would seem to follow from these data that the 
'"innetka list might well have employed competent 
>inion in a much more primary and positive role in- 
sad of in a supplementary, secondary, and negative 
le. The data also indicate that, as tested by the com- 
jtent opinion nf librarians, the rankings as to merit in 
e Guide are trustworthy. There remains, however, 
me force in the question, “Just what is literary merit 
lyhow?” Corrected for attenuation, the highest cor- 
dation is only .63. W^hile librarians agree very well 
nong themselves and while the readers in the Institute 
;ree among themselves, there is still much disagree- 
lent between the two groups. Even if the librarians’ 
itimates were combined with those of the Guide j the 
jmbination would not correlate higher than .68 with 
le combined opinion of two more similarly competent 
roups showing similar agreements. 



AGREEMENTS BETWEEN MERIT AS ESTI- 
MATED BY COMPETENT OPINION 
AND BY INTEREST VALUE 

In the preceding chapter evidence was presented 
showing that the judgments of merit by the 13 libra- 
rians are highly reliable and that they agree rather well 
with similar estimates of merit by the readers who 
prepared the recommendations of the Guide, We may 
now inquire whether the selection of best books by com- 
petent opinion gives the same result as selection of best 
books on the basis of children's choices. 

It should be remarked at once that agreements or 
disagreements here suggest very different conclusions 
than agreements or disagreements as to grading. If 
grading by competent opinion and grading on the basis 
of the reading grade ability of children reading certain 
books had shown no agreement, the result would have 
been clearly disastrous to one or the other or both 
methods of grading. Two methods of grading books 
cannot give totally different results and both methods 
be valid. When it comes to selecting superior books, 
however, the absence of agreements between the literary 
excellence or general worth of books as judged by ma- 
ture persons and the appeal of these books to children 
does not invalidate either basis since they do not pur- 
port to measure the same thing, Rather the absence 
of agreements between judgments of merit and interest 
value serves to point directly to the necessity of pre- 
paring selected lists on the basis of io/A criteria. 

[ 290 ] 
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In rhe original studies designed to validate the rank- 
ings of books according to the levels of merit given in 
the Guide and its rejected list we made a large number 
of studies of the interest value data published in the 
Winnetka Graded Book List and its rejected list. No 
matter how these data were manipulated or what cor- 
rections were applied, the same results were obtained. 
Accordingly, there is reported here only one study. In 
addition, for the purpose of this volume, we have made 
a parallel analysis of the relation between merit as 
judged by the 13 librarians and the various indices of 
interest value as given in the Winnetka list. 

There are no less than five possible indices of chil- 
dren’s choices or of interest value recorded in the Win- 
netka list. Each of these merits a brief description. 

1. The inde.x which is given the greatest promi- 
nence in the Winnetka list is the number of children 
reading and reporting on a book. It is, however, not 
an entirely satisfactory index since it measures availa- 
bility to an uncertain degree. 

2. The number of cities in which a book was read 
is a possible index of the interest value of a book. Some 
books were read in 34 cities while others were read in 
only one city. 

3. The Winnetka list reports a “popularity index" 
whicli was obtained by multiplying the number of chil- 
dren reading and liking a book by the number of cities 
in whicli it was read. The books arc listed in order 
according to this index. 

4. In reporting on a book each child checked one 
of the following statciueiits: 
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One of the best books I ever read- 
A good book, I like it. 

Not so very interesting. 

I don't like it. 

The percentage of children checking the first two state- 
ments constitutes a fourth measure. 

5. A fifth measure of interest was obtained by as- 
signing numerical values of 100, 67, 33, and 0 to the 
above statements and averaging. The fourth and fifth 
measures, accordingly, are simply different methods of 
scoring the same data and should agree very closely, 
"What do these five indices measure? Table 16 dis- 
plays their intercorrelations for lS6 books which the 
Winnetka list places in Grade V and for I S6 books in 
Grade VI and on which at least eight librarians’ judg- 
ments arc available. The number of ballots and num- 
ber of cities are highly correlated to the extent of .817 
and .767, Since the popularity index is the product 

TABLE 16 

Intercorrelatioks of* Five Indices of Interest Value for 156 
Fifth-Grad^ Hooks and for 156 Sixth-Grade Books 



Cities 

Popularity 

Liking 

Value 

Number of ballots — Grade V 

.817 

.953 

.193 

452 

Number of ballots — Grade VI 

.767 

.956 

.106 

.HO 

Number of cUjea — Grade V 


.911 

.221 

496 

Number of citleB — Grade VI 


.836 

.217 

.329 

Popularity index — Grade V 



.209 


Popularity index — Grade VI 



.142 

.203 

Percentage liking — Grade V 




.800 

Percentage liking — Grade VI 




.826 


Interest value — Grade V 
Interest value-^rade VI 
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of ballots and cities it necessarily correlates highly with 
its components: .953, .956, .911, and .836. Necessarily, 
also, the percentage of children liking a book and its 
average interest value correlate very highly, .800 and 
.826; but the cross correlations are exceptionally low. 
Number of ballots, number of cities, and popularity 
correlate on the average only .189 with the percentage 
of children liking a book and the average interest value, 
Obviously, both sets of figures do not measure the same 
thing. 

Which set of figures measures the appeal of the books 
to the interest of children? Or does neither set measure 
this factor? A preliminary question is the reliability 
of the measures, that is, whether they measure anything 
at all. From the fact that the number of ballots and 
the number of cities are independent measures and from 
the fact that they correlate so well, we are safe in in- 
ferring that these measures are fairly reliable. Accord- 
ingly, the popularity index must also be reliable. The 
Winnetka Graded Book List, however, gives no clue 
as to the reliability of the percentage-liking and inter- 
est-value indices, In the absence of a better measure, 
the reliability of the percentage liking has been deter- 
mined indirectly by first calculating the reliability of 
each percentage and then combining. The estimated 
reliability turns out to be .72 for grade V and .77 for 
Grade VI.® Presumably it follows that the interest- 
value index is also fairly reliable. 


®In detail, the logic is ,'is follows: The reliability of a distribution 
of scores is dc/iricd as the percentage of true variance in the variance 
of the obt.iined distribution, or 
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The most probable interpretation of the very low 
cross correlations is that the number of ballots returned, 
the number of cities reporting, and the popularity in- 
dices are primarily measures of availability. That 
availability is certainly involved, at least in part, was 
demonstrated in our studies of the material of Volume 
I of the Guide by comparing the number of ballots on 
books in relation to date of publication. Books pub- 
lished prior to 1915 averaged nearly 30 more ballots 
than books published after 1915, That the number of 
cities reporting correlates so high with the number of 
ballots returned also points to availability as the better 
interpretation. This is not saying that the school and 
city libraries of the 34 cities which were involved have 
a limited range of reading material but it can be cer- 
tain that all the over 9000 books which received at 
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least one ballot were not equally available to all the 
100,000 children who read various books. 

Doubts may also be thrown on the percentage-liking 
and interest-value indices. The average of the per- 
centage-liking indices is 87.5. That is, only about 13% 
of the children checked either the statement “Not so 
very interesting” or “1 don’t like it” in reporting on a 
book. Further, the introduction to the Winnetka 
Graded Book List^° reproduces a curve showing that 
the percentage of children liking a book did not change 
from grade to grade as might be expected. In spite of 
these two uncertainties, the author is inclined to the 
judgment that the percentage-liking and interest-value 
indices are more valid measures of the true appeal of 
books than either the number of ballots, the number of 
cities, or the so-called popularity indices. 

With this provisional interpretation in mind, we con- 
sider the correlations between the various measures of 
interest value and competent opinion. These are re- 
ported in Table 17. The first two columns give the 
correlations between the indices and the estimates of 
the 13 librarians. Columns three, four, and five record 
the correlations between the indices and the levels of 
merit according to the Guide and its rejected list. The 
24, 53, and 34 books are all those placed in Grades IV, 
V, and VI by the Winnetka list among the 125 books 
common to the Guide and the Winnetka lists. On the 
whole, competent opinion correlates positively with 
number of ballots, cities, and popularity, the values 
ranging from — .006 to .540, the weighted averages 


20 fifJcJ 21, oWgmal edition. 
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being .246, .167, and .213. On the whole, competent 
opinion correlates 2 ero or negatively with the percent- 
age-Iikifig and interest-value indices, the t s ranging 
from — 370 to .310, the weighted averages being— .254- 
and — .207. On the whole, the correlations between the 
indices and merit as recorded in Volume I of the Gui^Ie 
and its rejected list are slightly more positive than the 
correlations between the indices and librarian estimates 
of merit. This is due to the fact that the materials of 
Volume I are fairy tale, myth, and legend. Poor books 
in the eyes of adults which make strong appeals to chil* 
dren are much less common in the field of the fairy tale 
than they are in the whole area of children's books. 

These data tend to confirm the interpretation of the 
intercorrelations of the Winnetka indices. That com- 
petent opinion gives slightly positive correlations with 
number of ballots, number of cities, and popularity may 
be due to the dependency of availability upon the 
recommendations of librarians and others who prepare 
lists of books for purchase, by school and city libraries. 

TABLE 17 

CoRRBLATIONS OF COMPETENT OpiNlON WITH VARIOUS 

luorcBS OF Interest Value 


LibrjirlanB The Guide 


1 Id.Uc 

S30 

.g-s 
^ 2 

■g- 

S(5 

«!> 

l-s 

JSO 

32 books 
Grade VI 

^ cr 
iJ 

^ CJl 

.5? bp 

Number of ballota 

.282 

.159 

.H2 

.377 

,351 

.246d:.031 

Number of ciliea 

.158 

.041 

— .OOfi 

,421 

.540 

.167ct.0}2 

Popuiaiity {ndex 

.260 


,035 

.425 

,417 

.2l3±.01l 

percentage lUlng - 

-.295 

— .3H 

.026 

—.076 

.077 

— .25+±.031 

Inter«Bt Value — .370 

—.377 

—.069 

—.103 

.310 

— .270d:.030 
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The negative correlations between competent opinion 
and percentage-liking and interest-value were a com- 
plete surprise. There is no way of interpreting these 
data so as to yield the conclusion that there is any sub- 
stantial correlation between what children like and 
what adults approve. This does not prove, but it does 
warrant the inference, that lists of best books should be 
selected on the basis of both competent opinion and 
interest value. 



VII 


EXPERIMENTAL RESULTS: THE CORRE- 
LATION BETWEEN JUDGMENTS OF 
MERIT AND INTEREST VALUE 

In this and the following chapter additional data are 
presented which confirm and supplement the conclu- 
sions of the previous chapters. 

Early in our studies of the fairy-tale material in the 
preparation of Volume I of the Gnidcj an exploratory 
experiment was set up for the purpose of testing the 
reliability of three methods of measuring the interest 
value of individual stories. Individual stories were se- 
lected for study since these could be mimeographed and 
read by large numbers of children without excessive ex- 
pense. It was planned after testing the reliability of 
the three measurements of interest value to undertake 
further studies of (1) the correlation between judg- 
ments of merit and interest value and of (2) the corre- 
lation between grade placement and interest value. 
The further precise studies of the correlation between 
judgments of merit and interest value, most unfortu- 
nately, were never undertaken. The exploratory ex- 
periment, however, provides data which are worthy of 
presentatipn in this connection. The present chapter 
summarizes these data, while Chapter IX presents the 
results of a more precisely controlled study of the cor- 
relation between grade placement and interest value. 

The exploratory study was designed to test the re- 
liability of three methods of measuring the interest 
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value of individual stories. For the purpose of the 
experiment, 36 stories were selected and mimeo- 
graphed. These stories were selected from among 1S7 
which were the first ones to be subjected to systematic 
analysis by the staff of the Institute of Character Re- 
search early in 1926. The selection was primarily for 
moderate length and variety of types. As then judged 
by the staff all of these stories were of average or very 
superior merit, but later evaluations in the light of ail 
the fairy-tale material showed that the entire range of 
merit was included. The stories were divided into two 
groups of 20 stories, each group having four stories 
common to the other. Each of these groups was further 
divided into four sets of five stories each. These divi- 
sions were purely mechanical, the effort being to have 
each set contain the same amount of reading material. 
Within each set, the stories were arranged in three 
different orders to avoid any position error. One group 
of 20 stories was read by llS children from the third 
to the sixth grade inclusive in the schools of Spencer, 
Iowa, under the supervision of Superintendent J. R. 
McAnelly. The other group was read by 208 children 
from the third to sixth grade inclusive in the schools 
of Mason City, Iowa, under the supervision of Superin- 
tendent F. J. Vasey. 

The children made three reports on each story as to 
its interest value. Method I was as follows; After 
each story was read, they were asked to indicate its 
interest value on a scale of five statements thus : 

.... One of llic liest stories I ever read. 

.... A K^otl story, I like it. 

. .... I ncitlicr like nor dislike this story. 

.... Not so very interesting. 

.... 7 dislike this story very miicli 
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Method II consisted of indicating the best-liked and 
the, least-liked story out of each set of five, When the 
reading of the group of 20 stories was completed the 
children were given a list of the stories and were asked 
to indicate the three best-liked and three least-liked 
stories of the 20. These reports constituted Method 
III. A uniform procedure was used throughout, the 
teachers being particularly warned not to indicate in 
any way their own attitude toward any story. The 
children were given all the time they desired. 

The reports from each of the three methods were 
tabulated separately. Responses to Method I were as- 
signed values of 5, 4, 3, 2, and 1 for each of the five 
steps from liking to disliking. On Methods II and 
III, a value of three was given to each story rated as 
“best-liked,” a value of one was given to each story 
rated as “least-liked,” and a value of two to all other 
stories. These scores were summed and divided by the 
number of children reporting in order to obtain an 
average. The Mason City and Spencer data on the 
four stories read in both schools were combined and the 
36 stories treated as a unit. 

The reliability of each method for each grade was 
determined by dividing the children into chance-half 
groups and applying Brown’s formula. These are pre- 
sented in Table 18. From the table, the three methods 
are seen to be of nearly equal reliability, except in the 
third, grade where the checking of the three best-liked 
and three least-liked stories out of 20 has the advantage. 
The very high reliabilities, all over .95, obtained by 
combining the data for the fourth, fifth and sixth grades 
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are to be noticed. The degree to which the three 
methods measure the same thing was determined by 
calculating the intercorrelations for the combined data 
of the fourth, fifth, and sixth grades. Method I with II 
gives a correlation of ,84, I with III gives .88, and II 
with III gives ,94. A combination of these three 
measures should correlate .96 with three more methods 
yielding similar intercorrelations. 

The procedure itself and the internal consistency of 
the data are the best evidences of the validity of these 
measures of interest value. The theoretical validity 
(square root of the reliability) is .98. As a supple- 
mentary test eight literary critics in the Institute of 
Character Research, who had had six months’ intensive 
training in the judging of these materials for their gen- 
eral worth, were asked to estimate the interest values of 
these stories. As an aid they were given an adaptation 
of Uhl’s (7) standards of interest value which placed 
the emphasis on “dramatic, exciting, and adventurous 
action’’ and on “clearly drawn characters portraying 
kindness and faithfulness.” The reliability of their 
judgments proved to be .64, These estimates of inter- 
est value correlated .590 with a pool of the three 
methods of measurement. Corrected for attenuation 
this becomes ,752. 

Before presenting the correlation between the pool- 
ing of these measures of interest value and adult esti- 
mations of their merit, it should be remarked once 
more that the data were collected for the purpose of 
testing the reliability of the three methods of measuring 
interest values and not for the purpose of testing the 
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TABLE IS 

Reliability of Chilprbm’s Statements as to the IktERHst 
Value of Stories for Each Grade and Each Method of 
Mjjasurembnt 

^=* 36 . 


Grade 

M^diod I 

Mcliiod 11 

Method in 


in 

,656 

.594 

.851 


IV 

,877 

.883 

-871 


V 

-900 

.949 

.86+ 


VI 

.m 

.931 

■915 


IV-V-VI 

,962 

,974 

.952 



relationship between merit and interest values. It fol- 
lows that the data fall short of what might be desired 
in several respects. First, the number of stories in- 
volved is too small. Secondly, while the sample of 
stories contains a wide variety of tales, it is slightly 
overweighted with fables. Thirdly, the stories are 
spread over four school grades. There are eight third- 
grade stories, twenty-one fourth-, live fifth-, and two 
sixth-grade stories as judged by the staff readers. A 
more desirable selection would have been confined to 
stories all of the same grade. Fourthly, in the selection 
of stories no effort was made to insure a representative 
sampling of the various levels of merit with the result 
that the range of merit involved is too large. 

Two measures of merit are available. One consists 
of the degrees of merit assigned by the Guide supple- 
mented by the judgments on rejected stories of two 
readers who had charge of making all the final adjust- 
ments in the rankings of stories as published in the 
Guide. The second consists of special estimates made 
by the entire staff of readers, No direct measure of the 
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reliability of the first measure is available, but it is 
probably below that of the second, which is .85. 

Any obtained correlation between interest values and 
estimated merit will depend largely on the presence or 
absence of diflerei]ccs in interest value and merit. It 
may be inferred from the very high reliabilities that as 
to interest value these stories represent genuine differ- 
ences. The distribution according to the first measure 
of merit is more than adequate with a standard devia- 
tion of 2.09 in comparison with an estimated standard 
deviation of 1.60 for all the 6000 stories which were 
evaluated, With the presence of genuine differences 
in interest value and merit the stage is set to yield 
maximum correlation. The pooling of all three meas- 
ures of interest value and of responses from Grades IV 
to VI correlates .305±.102 with the special estimates 
of the entire staff and .254-=t.l06 with the rankings as 
published in the Guide supplemented by the judgments 
of two readers. The most significant of these corre- 
lations falls short of being three times as large as its 
probable error, In view of the large differences in- 
volved and the unreliability of the correlations, the 
data indicate that for these 36 stories there is no rela- 
tionship between merit as judged by competent adults 
and the interest value of the stories for children. It is, 
of course, extremely hazardous to generalize from only 
36 stories as to the true relationship for the some 6000 
tales in the field which were carefully examined by the 
staff of the Institute. 



VIII 


EXPERIMENTAL RESULTS : GRADE PLACE- 
MENT AND INTEREST VALUE 

The present chapter reports experimental data em- 
ploying a limited number of individual stories to sup- 
plement the material of Chapters II to IV concerning 
grade placement, Fundamentally, the issue involved 
is 'whether competent persons can place reading ma- 
terials at their correct absolute level in the several 
school grades. 

The procedure employed took its cue from certain 
results of the exploratory study of methods of measur- 
ing interest values which hinted that stories best-liked 
in one grade were not the ones best-liked two or three 
grades removed. Twenty tales were selected for study. 
All were of approximately the same length, the same 
level of merit, and all were distinctly fairy tales. They 
consisted of four stories presumably most suitable for 
second-grade children, four stories for the third grade, 
four for the fourth, four fifth- and four sixth-grade 
stories. This grading was carefully verified by the 
, staflf of readers of the Institute of Character Research 
at the University of Iowa and any story which fell half 
way between two grades was excluded from the selec- 
tion. From other studies the probable error in the rel- 
ative grading of these stories is estimated at four tenths 
of a grade. That is, relative to every other story, the 
chances are that not more than two or three of the 
twenty should be moved up or down one grade to con- 
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form to a theoretically true relative grading. Accord- 
ingly, for the purpose of this chapter, we shall assume 
that the relative grading is highly accurate. It is not 
assumed, however, that the absolute grading is accurate 
since the problem here is to determine whether the 
group of twenty tales as a whole should be moved up or 
down in its grade placement to conform to a true abso- 
lute grading.^^ Accordingly, we shall designate the 
tales by the non-committal letters A, B, C, D, and E 
and shall proceed to test whether they belong in Grades 
I to V, or II to VI, or IH to VII. 

Since the preliminary study showed little difference 
in the three methods of measuring interest values, the 
simplest method was used. This consisted of asking 
children to select the one best-liked and one least-liked 
story out of a set of five. Each set of five contained an 
A, B, C, D, and E story. Eight sets were prepared 
in such a way that each story was directly compared 
with seven others. Within each of these sets the stories 
were arranged in all possible orders to avoid any posi- 
tion error. Uniform instructions, as in the previous ex- 
periment, were sent to all teachers immediately in 
charge of the reading, The stories were read during 
the middle of the school year by 788 children from the 
second to the sixth grades, inclusive, in the schools of 
Mason City, Iowa, again through the courtesy of Su- 
perintendent F. J. Vasey. 

The resulting data were subjected to the same analy- 
sis as in the first study. The reliability of the interest 

^^See Chapter II for the distinction between relative and absolute 
grading. 
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indices based on the judgments of second-grade chil- 
dren is only .29, but for Grades III, IV, V, and VI, the 
reliabilities are .86, .88, .96, and .98, respectively. The 
indices obtained from the judgments of second-grade 
children correlate .504 with those of third-grade chil- 
dren and then drop to .012, .018, and .027 when cor- 
related with the data supplied by fourth-, fifth-, and 
sixth-grade children. Similarly, the indices obtained 
from third-grade children correlate .781 with those of 
fourth-grade children and then fall to .612 and .564 
when compared with fifth- and sixth-grade indices. 
The fourth-grade indices correlate .920 and .802 with 
fifth-and sixth-grade indices, and the fifth-grade indices 
correlate .955 with those of the sixth grade. 

In order to trace the trends in the interest values of 
individual stories the indices obtained from each school 
grade of children were translated into sigma deviations 
with the mean set at zero and the standard deviation at 
one. The resulting data ace displayed in Table 19. The 
Roman numerals across the top of the table indicate 
children in Grades II to VI. The capital letters at the 
'left indicate the supposedly most suitable grade for each 
story, A being second grade, B third grade, etc. The 
subscripts a, b, c, and d distinguish the four stories of 
each kind. Story Aa stands 1.2 sigma deviations 
above the mean of all stories according to second-grade 
children. Only one story is more interesting to second- 
graders. The standing of story Aa among third-grade 
children, however, is 1.6 sigma deviations below the 
mean of all stories, and only one story is more disliked. 
Here,, in only the next adjacent grade, is a fall from 
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TABLE 19 

Interest Indices in Terms of Sigma Deviations for 
Twenty Stories Based on Expressions of 
Interest by Children in Grades 
II TO VI 


Stories 

II 

Interest indices in Grades II to VI 
in IV V 

VI 

Aa 

1.2 

—1.6 

—2.6 

—2.5 

—2,0 

Ah 

0,0 

—0.6 

—1,2 

—1.8 

—2.0 

Ac 

0.0 

—0.8 

—0.8 

—1,4 

—1.7 

Ad 

—1,0 

—1.8 

—0.5 

— IJ 

—1.5 

Avernge 

0.1 

—1,2 

—1.3 

—1,8 

—1.8 

Bn 

0.0 

1,7 

0.7 

0.3 

0.2 

Bh 

0.0 

0,7 

0.3 

0.3 

0,4 

Be 

0.0 

0.0 

—0^.5 

—0,3 

—0.2 

Bd 

1.1 

0.3 

—0.6 

0.2 

0,1 

Average 

0.3 

0.7 

0.0 

0.1 

0.1 

Cn 

0.0 

0.6 

0.8 

0.0 

0.0 

Ch 

0.6 

1.1 

1.3 

0.8 

0.6 

Cc 

2.7 

1.3 

0.7 

1.0 

0.7 

Cd 

—0.4 

0.3 

0.3 

0.2 

0.4 

Avernge 

0.7 

0.8 

0.8 

0.5 

0.4 

Da 

—0.4 

—0.2 

1.1 

1.3 

1.5 

Dh 

—1.3 

—1.1- 

0.3 

0.1 

0.5 

Dc 

1.2 

0.7 

0,9 

I.l 

1.3 

Pd 

— 0,fl 

—1,1 

0.6 

0.8 

0.4 

Average 

—0.3 

-^0.4 

0.7 

0.8 

0.9 

En 

•—1.7 

ao 

0.1 

—0.3 

0.5 

Eb 

—0.1 

—0.3 

—1.2 

0.2 

—0.2 

Ec 

0.2 

1.7 

1,4 

1.1 

0,9 

Ed 

—1.3 

—0.8 

—1.1 

0.1 

0.1 

Average 

—0.8 

0.1 

—0.2 

0.3 

0.3 


grace with a vengeance. Similarly, all of the A or sup- 
posedly second-grade stories stand higher in the eyes of 
second-grade children than in the eyes of third-grade 
children. The averages tell the tale of the fall in the 
market value of these stories starting with 0.1 sigma in 
the second grade and dropping to — 1.2, — 1.3, — 1.8, 
and — 1.8 sigma deviations in the sixth grade. The fig- 
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ures for the other stories are interpreted in like manner. 
On the average, the four B or supposedly third-grade 
stories reach their peak of interest in the third grade. 
On the whole, the four C stories stand highest in the 
eyes of third- and fourth-grade children. They are 
almost as well liked by second-grade children and dis- 
tinctly less well liked by fifth- and sixth-graders. Both 
the groups of D and E stories show a steady and marked 
rise in interest value from the earlier to the later grades. 

Save only the indices obtained from second-grade 
children, the standard errors of the indices are com- 
paratively small, ranging from .37 to .14 sigma. But 
since the indices in Grades III to VI are highly cor- 
related, the standard errors of the differences^^ are even 
smaller, ranging from .32 to ,08. Taking each individ- 
ual story at a time, the significance of all the differences 
in its standing in the five grades was tested. Forty- 
five differences proved to be three or more times as 
large as their standard errors. Fourteen of the 20 
stories show statistically significant changes in their in- 
terest value depending on the grade in which they are 
read. We may also calculate the significance of dif- 
ferences between differences. For example, the differ- 
ence between story Ja and Ba in Grade II is 1.2 sigma, 
while in Grade III it is — 3.3 sigma. The difference 
between these differences is 4.5 points. That is, if both 
ate started from the same point the total divergence 
becomes 4.5 points. This difference and many more of 
the same type are well beyond three times as large as 
their standard errors. By this method of analysis, the 


*^S.E. difEererice= V — 2ri2(riiTs 
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trends of each of the 20 stories assume statistical signi- 
ficance. 

We turn now to the primary problem, that of abso- 
lute grading. Do these stories which the readers in the 
Institute of Character Research place in Grades II, 
III, IV, V, and VI belong in these grades? Or should 
they be moved up or down a grade or more? To test 
this question, it will be convenient to start with three 
alternative hypotheses and an assumption. The assump- 
tion is that the school grade showing the highest interest 
value should be the best grade for a story and that the 
indices should show a progressive falling off as one 
moves away from the grade of highest interest value. 
Our hypotheses will be designated X, Y, and Z, Hypo- 
thesis X is that the A, B, C, D, and E stories should be 
placed in Grades I to V. Hypothesis Y is that these 
stories belong in Grades II to VI. Hypothesis Z as- 
serts that these are third-, fourth-, fifth-, sixth-, and 
seventh-grade stories. Which of these hypotheses best 
fits the data? 

Consider first the A stories. According to hypothesis 
X these should be first-grade stories and should show 
declining interest indices from the first to the later 
grades. According to hypothesis Y, these are second- 
grade stories and should show falling indices beyond 
Grade II. According to hypothesis Z, these should be 
third-grade stories and should show a rise in in- 
terest value to Grade III and then a decline. Hypo- 
theses Y and Z «tate precisely opposite trends for the A 
stories between grades II and HI. Turning to the 
data, we find that these four stories behave according 
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to hypothesis Y. Continuing this testing process with 
the B, G, D, and E stories, we have the following re- 
sults. Out of 16 tests as between X and Y, the data favor 
hypothesis y in 8 instances and X in 5, while in 3 in- 
stances the test is uncertain due to the absence of a 
trend. As between Y and Z, out of 16 tests, the data 
favor y in 12 and Z in 4 instances. 

Consider, secondly, the A and B stories. According 
to hypothesis y the A stories should fall and the B 
stories rise in interest value between Grades II and III, 
while according to X both groups should show declin- 
ing indices. Here there are 16 tests. Four are uncer- 
tain. Of the 12 decisive tests the data fit hypothesis Y in 
8 and y in 4 instances. Continuing this type of com- 
parison as between X and Y there are 96 such tests in 
the table. Of these, 32 are uncertain, 45 favor Y and 
20 favor X. Similarly, as between Y and Z, out of 96 
tests, 40 are uncertain, 42 favor 7 and 14 favor Z. Com- 
bining these data with the 32 tests of single stories, we 
have the following percentages. Of decisive tests as 
between X and Y, 32% favor X and 68% favor 7, As 
between Y and Z,7i% favor 7 and 25% favor Z. 
These percentages differ from a chance 50-50 division 
by more than five times their standard errors. Hypo- 
thesis y, that the A, B, C, D, and E stories are second-, 
third-, fourth-, fifth-, and sixth-grade stories, fits the 
data more than twice as well as hypothesis X and three 
times as well as Z. That X and Z so nearly balance 
each other tends also to confirm the advantage of hypo- 
thesis Y. A further check is to test whether hypothesis 
X or Z.best fits the data. Here there are 192 possible 
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tests, of which 83 are uncertain. Of the 109 decisive 
. tests, 54% favor X and 46% favor Z. That these two 
hypotheses arc equally satisfactory points to the superi- 
ority of the intermediate hypothesis Y. It should be 
noted also that as between X and Z the uncertain tests 
constitute 44% of all tests, while as between X and Y 
and between Y and Z the proportion of uncertain tests 
is only 34%. 

Again, it is hazardous to generalize from so small a 
number of cases. The data, however, are important 
confirmation of the conclusion that competent opinion 
can be relied upon to place childrens’ reading materials 
correctly not only on a relative but also on an absolute 
grade scale. 



IX 


SUMMARY, INTERPRETATIONS, AND 
PROBLEMS 

As stated in Chapter I, the function of studies of this 
monograph is to mediate between two opposed proce- 
dures employed in the preparation of graded lists of 
superior reading materials for children. One proce- 
dure, adopted by the JVinnetka Graded Book Listj re- 
lies almost entirely on data collected directly from 
children. The other, adopted by the volumes of a 
Guide to Literature for Character Training, relies ex- 
clusively on competent opinion. Agreements and dis- 
agreements between the results of these two approaches 
have been tested as to (1) grade placement and as to 
(2) general worth. 

The results of the studies of the placement of books 
in the several school grades are rather decisive and 
unambiguous and may be stated somewhat categori- 
cally. The original grades reported in the Winnetka 
list agree very well with those reported in the volumes 
of the Guide and with competent opinion in general as 
to relative grading but do not agree as to absolute grad- 
ing. That is, the correlations between the Winnetka 
grading and competent opinion arc high, while the 
average grade and range of grades assigned are in dis- 
agreement. Books which competent opinion places in 
Grades I, II, and III tend to be located in Grades IV 
and V by the Winnetka list, and books which compe- 
tent opinion places in Grades VIII, IX, and X tend to 
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be located in Grades VII and VIII by the Winnetka 
list. These discrepancies are due to the failure of the 
Winnetka list to obtain an adequate sampling of ballots 
from the earlier and later grades and to the failure of 
the Winnetka list to apply the preferable naeasure of 
central tendency. Corrections were applied for both of 
these factors to 159 books common to the Winnetka list 
and Volume I of the Guide, When the corrections are 
applied, the discrepancies in absolute grading disap- 
pear. Within the ever-present limitations imposed by 
ncessary errors of measurement, reliance upon com- 
petent opinion or reliance upon data obtained directly 
from children should give the same grading. While 
the actual grading reported by the Winnetka list is 
faulty, the essential ideas involved arc entirely sound 
and worthy of special commendation. The conclusions 
are supported by an experimental study of 20 individual 
stories. 

The second concern of these studies was to test agree- 
ments between two methods of selecting superior books 
for children. The Winnekta Graded Book List pur- 
ports to make its selection primarily on the basis of 
children’s interests supplemented by the competent 
opinion of 13 librarians ; the volumes of the Guide rely 
exclusively upon competent opinion. Since the intro- 
duction to the Winnetka list leaves the impression that 
the librarians’ judgments are unreliable, a necessary 
preliminary point which had to be considered was the 
agreement between the two sets of competent opinions. 
Analysis of the data, however, shows that the reliability 
of the librarians’ judgments is most exceptionally high, 
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Further, the estimates of merit by the librarians show 
substantial agreements with the rankings of merit ac- 
cording to the Guide. 

The evidence is that the estimates of merit by com- 
petent opinion do not agree with the appeal of the 
books to children. The data show very low or neg- 
ative correlations between competent opinion and the 
percentage of children liking a book, and average in- 
terest values and only slightly positive correlations be- 
tween competent opinion and number of ballots re- 
turned, cities repotting, and the so-called popularity 
index. The positive correlations are interpreted as due 
to the probability that these three indices measure avail- 
ability more than they measure the appeal of the books. 
An experimental study of 36 individual stories also in- 
dicates that there is very little correlation between w^hat 
children like and what adults approve. 

These results are of crucial importance in the selec- 
tion of superior reading materials. Librarians, teach- 
ers, and parents rightly insist that the reading material 
of children be free from objectionable content Pre- 
sumably, children will not read what is uninteresting. 
Close agreements between competent opinion and chil- 
dren's choices would have meant that the preparation 
of selected lists might employ either method or both 
combined, The absence of agreements does not prove, 
but it does create the inference, that both competent 
opinion and the choices of children must be taken into 
consideration. 

This conclusion is critical both of the volumes of 
A Guide to Literature for Character Training and of 
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the Winnelka Graded Book List. While the judg- 
ments of the librarians indicate that the readers in the 
Institute of Character Research have done a reasonably 
good job in estimating literary merit and content values, 
the absence of agreements with children’s choices in- 
dicates that this is only half the battle. While the gen- 
eral plan of the Winnetka investigation is sound, the 
translation of the plan into concrete terms leaves much 
to be desired. Instead of presenting the popularity 
index as the best single measure of the general worth 
of a book, ah index based upon the librarians’ estimates 
and upon the percentage liking or interest values should 
have been employed. The number of ballots returned 
on a book would then have served in negative role to 
eliminate books whose appeal to children was inade- 
quately measured. 

No one should draw the conclusion that these studies 
point the way to methods of selecting superior reading 
materials, There is a need, first, for a good many studies 
of the interest value of books and stories under con- 
trolled conditions. There is a need, secondly, for fur- 
ther testing and refinement of the methods of obtaining 
expressions of interest from children on a large scale. 
The percentage-liking and interest- value indices de- 
veloped by the Winnetka study are a start in this direc- 
tion. The author is inclined to believe that these are 
more valid measures of the appeal of books than either 
the number of ballots returned or the number of cities 
reporting on the popularity indices. But, just how 
valid the percentage-liking and interest-value indices 
really are is a question which these studies do not even 
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touch. Thirdly, there is a need for further refinements 
in methods of judging the worth of reading materials 
by competent persons. The Institute of Character Re- 
search has made a notable contribution in this direc- 
tion. It seems to the author, however, that the Institute 
has sacrificed validity for reliability. A greater variety 
of points of view and background while giving less re- 
liable estimates would improve validity. The same 
criticism may be applied even more aptly to the esti- 
mates of the 13 librarians. These are so very highly re- 
liable as to suggest either the absence of healthy differ- 
ences of opinion or that these judgments are not inde- 
pendent at all but derived from some common source. 
Finally, there must be many studies of the function of 
supplementary reading materials in relation to the spe- 
cific and. more general aspects of the whole educative 
process. Is it true, as suggested in the Winnetka GraJecl 
Book Liitj that a wide variety of interesting, worth 
while, and properly graded books will aid materially 
in solving many of the problems in teaching children 
how to read? Is it true, as asserted in Volume II of 
A Guide to Literature for Character Training) that the 
moral values presented in the story will carry over into 
life? 
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^TUDB CRITIQUE DES MEILLEURS LIVKES POUR LES ENFANTS 

(R^sum^) 

Od b faltcette ^tude dans leUub d'^valuer \ei rfcsuUau dc deux miiViodei 
diff^rentes employees daUB Ja pr£pQralloD de Ifslfli progretBives dc morceaux 
3upkieur6 de leclara pour Us enfanta. Une mdthode, adopi£o pur U 
IFinnetka Graded Book Lhi, depend preique eatl^remcnt ilc donu^cii fournlea 
direckement par lea enJEanta. L'BUlrftj aiiopt6o par lea vohmta il’un Guid^ 
to Literature for Ghcratier Training^ depend excluilvenieDt dc I'opinion clca 
aduUea comp^Untir. On bc demands ai Ilea Tn6thDdM al differ enUa don n ant 

lea m^meB r^suhata. 

La Hake Winnetka ct le Guide rtcomraaudunt uoe certainc ftnnic acoUlffl 
carame la plus adapt^e h chaqua livre. Bien que U corrdlaLlon entre Icb deux 
Tecon^inajidfltioiiB so Ik: dlcv£c, on. i trauvd dc grandca dlE^rcacca dona U 
position dea anndea acolaires moyennea et la varlition 'del ounces cmptoy£ea. 
On tt trouvd que cca dtlEdtencea oat £td dues d deux lauLCB dana la campUa- 
tion dea donndea Wlnnetka. Quand on lea a ccrriE6eaj Icb diffdrcncca not 
disparues. Employee correctement, la m6lhodc dc ddpeudro des donn^oB 
fournies par les enfents ou la in£thode de ddpendre dc I'oplnion comp^tcnic 
donnent le meme r^sultat. Cette conclusion aoiitleDt la vallditd dea deur 
m^thodes, 

Le deuxi^rne butde cette £tude a de tester lea dld£rcncca dnna lo choix 
des livrea aup^rjeura. La Hate WinnetJea les cholalt prcinl^renicnt sur In base 
dea Jit^T^ta des enfanta; les llrrea du Guide dependent cxcluaivemeni dc 
l^opinioD des adulies compfitenta. L'6vidence raontre que les jugemente diu 
mdiite par i'opiojoD comp^tente ne s'acoordent pas avee Vattrak dea livrea 
pour les enianks. Un bon accord de ^opinion comp6tente et des cboix des 
enhnts aurait aignlH^ que la preparation de llstos cholaloa de morccaux dc 
lecture pourroit employer I'one ou I'autre m^khode. t/nbsence des accords 
ne proure pas mais fait aupposer que Fon dolt conslddrer et I'oplnion 
compitente et les^ choix dea entaots, Cette concluilon crllinue les volumes 
d'un Guide to Literaiure for Charactet Training et coux de (a IPinneika 
Graded Book Hit. 

SaUTTLE WORTH 


EINE KRITISCHE UNTERSUCHUNG ZWEIER LISTEN DER BESTEN 
BUCHER FOR KINDER 
(Refernt) 

Dns Ziel dieaer UnteraucHung war, die Reaultate von zwei gegensiitzlichen 
Verfahren, dig In der Zubereltung von rangmllBaig geoedneten Listen hflhcr- 
stehendea (superior) Lesemateriala fQr Kinder benutzc worden nind, 
abzuschiitzen. Das cine- Verfahren, welchea be'i der Verferligung des 
Winnetkii Graded Book List gebraucht worden ist, verlSast sich fait volL 
koromen auf Daten, die direkt an Kindern gesammelt worden aind. Das 
andere, das.ln den Bdnden dea ^Gnide to Literature for Character Training*^ 
(Wegweiaer 2ur Literatur zur Dildung des Karakters) GcbrnucK gefunden 
hat, atiStzt alch auaschliesilich auf die Mcinung aichkundlgcr Etwachsener. 
Es wlrd dje Frage gestellt, oh solcha verachiedenc Methodon die aelben 
Reaulkaten lief cm. 

Sowohl in der' Wlanetka Liste wic in dem Wegweiser wird Im ZuSBrnmen" 
hang Tnit jedem Buche eine bcaondere Schulklasse aU dlcaem Buche be- 
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floodcrs nngeeignct ancmpfehlen. Obwohl die Korrelation zwlschea denr 
Gruppen yon Empfchlungen cine hohe ist, zeigLcn sich doch grosae 
Untcrschiedc in der mittlercn StclIuDg utid in dcm benulzten Umfqng (range) 
der Schulklnoscn. DIcae Widcrftprilche konnten nuf zwci FchIvfirfahren bei 
der Verfcrcigiing dea Winnctka MaCeriaJa zurdckgefiliirt wcrden. Nachdem 
Korrigierungcn angewcndet worden waren, vcracliwnnden die \Vidcr8pruche. 
Wird sie rlcUtig angcwcndct, ao Iciatct die Mcdiodc dcs Vcrtrauena auf on 
Kindern gcBammcltci Material dns Acibc Rcaultatj wic die dea Vertrauens au^ 
die Meinung sachkundiger Erwachaener, Dieacr Befund unteratdlzt die 
Gtiltigkeit beider Verfahren. 

Die zweitc Beachilftigung diefter Untcrauchung beatand darin^ dasa man 
die Uebcreinatimrnung bei der Auswahi h5hcratchcnder Bilcher prQfte-. In 
der Winnetkn Liate wird die Auswahl yorherrachend durch die Intcreaaen 
von Kindern bedingt. In den Biinden dea Wegweiaers wird daa Vertrnuen, 
wie geaagt, vollkommcn auf die Meinung aochkundiger Erwachaener geatellt. 
Ea zeigt sich, daas die Werturleilc nach aachkundlger Meinung mic der 
Anziehungakrafti die die BBcher auf Kinder ouatibeni niche Bbereinatinimen. 
Eine innige Ucbereinstlmmung der anchkundigen Meinung mit den Aiia- 
wahlen von Kindern Kiitte angedeutet, dasa zur Verfertigung von Liaten 
von ausgewihltem Lcsematerial fiir Kinder sowahl das eine wie das andere 
Verfahren geeignet iat, Per Mangel an Uebcrelnatlmmungcn bcwelat nichtj 
weiBt aber docU darauf hin^ daas aowohl sachundige Melnungcn wie die 
Auawahlcn von Kindern in Betrncht gezogen werden mflasen. Dieaer 
SchluBS Btellt aich aowohl den Biinden dea ^'Guide to Literature for Character 
Training'^ wie dcm ^'Winnctka Gnaded Book Llat'^ kritiach gegeoilber. 

Sh uTTr.BWonTK . 
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PREFACE 


This work represents a careful and thorough review 
of the literature on respiratory exchange determination 
directed especially towards enlarging our information 
on the effects of muscular activity, recovery from such 
activity, and fatigue factors on the oxygen consumption 
of individuals. It differs from other available reviews 
in that the author has made no attempt to enlarge or 
complicate the work by introducing pathological con- 
ditions, or other phases of metabolic studies not directly 
pertaining to his principal theme. The section which 
deals with methods is especially worthy of note in that 
the author has spared no attempt to include all of the 
references pertaining to methods involved in this type 
of , technique. The physiology introduced is of an ele- 
mentary nature since the author has compiled this work 
for psychologists and industrial workers who are not 
completely trained in physiological methods. 

The work gives promise of being a valuable refer- 
ence guide to industrial workers and psychologists in- 
terested in quantitative measurements of activity and 
efRciency of the human body as determined by the O2 
consumption and CO2 excretion methods. 

M. M. Kunde 

Department of Medicine 
The University of Chicago 
Chicago, Illinois 
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INTRODUCTION 


There is available at the present time no general 
manual in English on methods, apparatus, or results in 
the field of measuring the energy cost of human "vvork, 
so that experimentation of this kind in the laboratory 
or in industry is denied to all except those who have 
had extensive training in physiology. Since the fac- 
tors influencing metabolism are so numerous and com- 
plex and the sources of error in metabolic experiments 
are so varied and often unexpected, perhaps it is as well 
that there is no such manual at hand, for it would be 
likely to tempt novices into engaging in experimental 
work before they had acquired an adequate theoretical 
background. 

Our grounds for this fear lie in the fact that the 
measurement of the energy cost of human work is a 
most attractive field to the industrial engineer and to 
the industrial psychologist in that it offers promise of 
help In the solution to many of their most baffling prob- 
lems. As a matter of fact, several workers whose train- 
ing in physiology has been limited have already essayed 
original investigations using metabolic techniques and 
have published research which reveals this unfortunate 
lack of fundamental preparation in physiological 
theory. 

The investigators responsible for this type of work 
have made reasonable efforts to place their research 
above criticism and are not to be censured for their at- 
tempts to introduce these methods into their attack 
upon industrial problems. Without these techniques, 
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industrial psychologists and engineers have been forced 
to satisfy themselves with conjectural answers to many 
problems relating to the interpretation of different 
types of work curves and the ultimate effect of various 
factors upon industrial efficiency. For example, it has 
been known that production can be accelerated through 
certain incentive plans, but it has not been possible to 
determine whether the acceleration is the result of more 
efficient modes of work, or the expenditure of excessive 
energy, or both. Motion studies might liclp reveal 
whether the act Avas being accomplished more effi- 
ciently, but there has been, no means of detecting the 
possibility of disproportionately increased energy con- 
sumption until it made itself known months later in the 
form of sickness, accidents, labor turnover, or indus- 
trial unrest. 

There is a need, then, for a means of rendering acces- 
sible to the industrial engineer and the industrial psy- 
chologist the literature which will enable them to uti- 
lize these methods properly in attacking the broad class 
of industrial problems indicated, and to enable them 
to perform fundamental research in establishing the 
limits and the validity of the methods. It is the writer’s 
object in preparing this guide to the literature on res- 
piratory exchange determination to supply information 
which will be helpful to such a program in two sep- 
arate, although related, directions. In the first place, 
this guide should assist any well-trained psychologist 
or engineer to plan physiologically suitable, adequately 
controlled, experimental investigations in the labora- 
tory and in the factory; and, in the second place, it 
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should supply a basis for critical appraisal of current 
research reported by workers already employing meta- 
bolic techniques. It seems to the writer that these two 
ends are about equally important, and ultimately equal- 
ly constructive. 

It must be understood at the outset that this mono- 
graph consists only of a guide to the literature; it is im- 
possible to arrive at any understanding of energy meta- 
bolism and respiratory exchange from merely reading 
this monograph. It is essential that the reader main- 
tain constant access to a good library throughout the 
period of his study. We have assumed an adequate 
grounding in scientific method in general on the part 
of the reader, but we have also needed to assume a com- 
plete absence of technical training in physiology. 

No investigator can hope to plan experiments in the 
measurement of energy consumption by respiratory ex- 
change methods without a good working knowledge of 
the general physiology of digestion, circulation, respira- 
tion, and muscular work, and it should be unnecessary 
to add that eventual practice in the manipulatioil of the 
apparatus chosen for the research is equally vital. It 
would be desirable for the investigator to have direct ac- 
quaintance with many types of metabolic apparatus, 
but familiarity with them through the literature will 
probably have to suffice for most workers. Judicious 
selection from the references in this syllabus should en- 
able anyone to give himself a course of training in the- 
ory and a general familiarity with methods and appara- 
tus that will be adequate for the purpose. 
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The bibliography* supporting this study is by no 
means exhaustive within the subjects covered, and many 
related fields, such as animal experimentation of all 
kinds, have been wholly excluded. Those who need 
further references should consult the bibliographic 
notes which precede the bibliography. Although this 
guide was not written for the professional physiologist, 
the author believes that readers in this group will find 
the references on methods and apparatus more com- 
plete than in any other compilation available in Eng- 
lish, and may find certain other sections of interest, 
particularly certain of the sections on factors affecting 
metabolism. 

•The bibliogrnphy covers the Uteraturc through 1928 only. The 
author regrets this delay in publication, but feels that it should be 
possible for the reader to cover the intervening period witliout undue 
difficulty. 



I 

PHYSIOLOGICAL FOUNDATIONS 

Theory of Respiratory Exchange Deierminaiion. 
The theory of metabolic measurement through respira- 
tory exchange determination may be outlined in its bare 
essentials as follows: 

Animals derive the energy they expend in work and 
heat from the oxidation of foodstuffs within the body. 
The process involves, among many other things, the 
withdrawal of oxygen from the inspired air, and the 
formation of carbon dioxide, which is added to the ex- 
pired air. A certain minimum rate of such gaseous ex- 
change is a necessary accompaniment of the processes 
responsible for the maintenance of the body tempera- 
ture, the beating of the heart, and similar activities, 
and is an index to what is known technically as the 
basal metabolic rale. Beyond this point, muscular 
work involves the oxidation of foodstuffs within the 
tissues in a fixed proportion to the amount of work 
done. To determine the amount of work done, then, 
would be a simple matter if we could in some way dis- 
cover the amount of food stuff which has been oxidized 
during the work period in excess of that required for 
the basal metabolic processes. We cannot hope to 
measure the amount of foodstuffs oxidized in the body 
by any direct means, but, ignoring certain qualifications 
which will be more adequately treated in later sections, 
we can derive a computed figure of the amount if we 
know cither the volume of oxygen which has been con- 
sumed or the volume of carbon dioxide given off dur- 
ing the pcrioil in question. 
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Fali(iue and Energy Cost. Even this brief outline 
should serve to save the reader from confusing the con- 
cepts of "fatigue" and "energy cost." This is a distinc- 
tion Vifhich has not always been clear, and even today 
the two concepts are sometimes mingled in a way that 
indicates rather hazy thinking on the matter. This is 
recognized, for example, by Strauss and Bandmann 
(5+7). In their review of the methods of fatigue 
measurement they commend the accuracy of metabolic 
methods but recognize that the concept of energy ex- 
change is not equivalent to the fatigue concept. They 
suggest that metabolic methods should be particularly 
useful in studies of the effect of rest pauses and cite in 
this connection Hill, Long, and Lupton’s (285) study 
of the recovery period. Frois and Caubet (552) have 
also criticized respiratory exchange measures as a test 
of industrial fatigue. 

Polakov (468, 469), one of the few American work- 
ers to utilize respiratory exchange techniques, has fall- 
en into the error just mentioned in his proposal to 
employ carbon dioxide as an index of fatigue. This 
proposal was discussed at a meeting of the American 
Society of Mechanical Engineers (lOa) and was ably 
criticized by F. B. Flynn in a written discussion sub- 
rpitted after the meeting. The general tenor of the 
discussion, however, remained more or less favorable to 
the method, and Dana (152) and others have quoted 
Polakov’s articles on the subject. 

Other workers who have distinguished more care- 
fully between these concepts are such men as Amar 
(9) in France and Atzlec (16, 19,21) in Germany. The 
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reader may also be referred to the work of Herbst 
and Nebuloni (270) in this connection. Waller (602) 
made wide use of carbon dioxide production as a 
measure of energy cost, and discovered a progressive 
increase in cost with continued muscular work. He 
attributed this decrease in efficiency to the effect of 
fatigue, but it must be understood that Waller made a 
clear distinction between the two concepts and was 
merely interested in discovering their relation to each 
other. 

In an article by Page (458) the energy cost concept 
is defended as being more susceptible to meaningful 
quantitative treatment than any of the usual fatigue 
concepts, and hence of more value in the majority of in- 
dustrial investigations than the industrial fatigue con- 
cept. This paper was developed in part from an earlier 
contribution by Muscio (443) and in it the stand is 
taken that there is little relation between the subjective 
concept of fatigue, the physiological concept, and the 
industrial output concept, and that the concept of en- 
ergy cost provides a more generally useful working 
basis for the industrial psychologist and engineer than 
any one of these, Because of the similarity of the 
problems to which fatigue studies and energy cost de- 
terminations may be applied, however, we have in- 
cluded a number of the more relevant references to the 
fatigue literature in our bibliography, and shall refer to 
them very briefly here. 

Literature on Fatigue Measurement. The references 
to Vernon (569), Florence (194, 196), and Farmer 
(189) may be taken as examples of output studies of 
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fatigue, and the papers by Martin (404) and Haggard 
(241) as expositions in a semi-popular vein of the phy- 
siological view of fatigue. More technical treatments 
on the physiological side are those of Hastings (255) 
and Durig (178), and a well-organized statement of the 
problem of industrial fatigue phrased in modern phy- 
siological terms has been written by Spaeth (533). 
This work carries a bibliography of several hundred 
titles topically classified. Typical physiological fatigue 
tests are represented by Martin (405), Ryan (493), 
and Strauss (546), and the entire field of fatigue 
measurement has been reviewed by Sachsenberg (494) 
and by Strauss and Bandmann (547) . Negative results 
with physiological fatigue tests are reported by Net- 
schajeff (445) and by Lee and Vanbuskirk (371 ) . A 
medical approach is represented by Mayers (41 1 ) and 
Ochsner (451). 

The work of the Industrial Fatigue Research Board 
of London is briefly described by Vernon (275) and 
by Wilson (616). Wilson (615) and Bordas and Cour- 
tier (107) have discussed the value of fatigue preven- 
tion and Wilson presents a plea for the international 
pooling of information and facilities through the Gen- 
eva office. There seems to have been no action taken 
on his proposal, however. 

General Physiology. Several modern books on ele- 
mentary physiology arc available, of which Mottram 
(437) and Douglas and Priestley (170) may be taken 
as examples. Mottram represents a recent populariza- 
tion of physiology useful in establishing an integrated 
background. There are chapters on nutrition, respira- 
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tion, circulation, and muscular work, as well as the 
other typical physiological headings. More advanced 
treatments will be found in Stewart (S38ii), in Bayliss 
(38), and in Howell (299). These are standard text- 
books of college grade. The reference to Waller (580) 
is rather old but has been included because of the im- 
portance of his later studies on the energy cost of vari- 
ous types of industrial work. In spite of the early date 
at which the book was written, his chapter on respira- 
tion shows strikingly modern insight, and the other 
chapters on nutrition and related subjects are in many 
instances subject to but slight revision now. Bain- 
bridge (30) has written a comprehensive book on the 
physiology of muscular exercise. The entire volume 
is organized principally as a series of abstracts of the 
numerous French, German, and English works in- 
cluded in his extensive bibliography. Evans (186) 
has chapters on recent developments in muscular phys- 
iology, muscular contraction, oxygen consumption, 
carbon dioxide output, and related topics. 

Stiles (539) has written a clear and readable intro- 
duction to the subject of nutritional physiology. The 
book may be highly recommended as introductory ma- 
terial for the novice. Lusk (387) is the standard Eng- 
lish reference on this subject and is written from a 
more advanced standpoint. His Fundamental Basis of 
Nutrition (389) is a more recent work on this subject. 

It would be futile to demand that the engineer or 
psychologist acquire a knowledge of biochemistry be- 
fore starting experimentation, but he should at least 
familiarize himself with the general scope of this field. 
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Morse (+3S) has written a compendious student manual 
which should prove useful in providing technical back- 
ground for reading modern writings on the physiology 
of muscular exercise. On pages 23-26 of this reference 
will be found periodical and handbook references of 
a general nature. The book also contains useful chap- 
ter bibliographies. Bodansky (92) has written a read- 
able textbook on the subject, although he necessarily 
assumes a knovrledge oi otgauic. chemistty ot\ the pavt 
of the reader. Pryde (474) and Sumner (550) may be 
cited as further references in the field of biochemistry. 

An attempt to present physiological concepts from 
the engineering point of view has been made by Briggs 
(113) in an article which stresses the resemblances be- 
tween the human body and a steam engine and makes 
a plea for the study of the human body by the engineer. 
This study will be referred to later in connection with 
his concept of the “crest load” or the point which can- 
not he exceeded without overloading the human nva-- 
chihe. Dana’s (152) book is also useful for engineers, 
although the physiological phases are not covered very 
adequately. McCurdy (418) has written a textbook 
on tn'e.physiology of exercise. This is a good general 
' study 'Vyfth bibliographies at the close of each chap- 
ter. The book may be recommended to the general 
readeri . Another book, in which the mechanical and 
engineering aspects are as prominent as the physiologi- 
cal ones is The Human Motor by Amar (10). This 
book trealaof the laws of mechanics in a purely physi- 
cal sense and then applies the laws to the structure and 
mechanics. of the human body. There arc chapters on 
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the chemistry of alimentation, on the mechanics of hu- 
man energy expenditure, on the factors affecting the 
economy of human work, on the various types of ap- 
paratus used in diverse kinds of human measurement 
and on the variety of metabolic studies which have 
been made of walking, climbing, bicycling, and in- 
dustrial work. McDowall (419) discusses in a non- 
technical way the physiology of exercise and mental 
work in their relation to industry. 

Basal Metabolism, The reader who has covered 
even a few of the references given above will realize 
the importance of understanding basal metabolism and 
its relation to the metabolism of muscular work. Ba- 
sal metabolism represents the minimal energy consump- 
tion of the body or the energy requirements for 
maintaining of the body temperature, for providing the 
energy for the beating of the heart, and for other similar 
physiological processes of a fundamental and continu- 
ous nature. The best-known manual in English is prob- 
ably that of DuBois (172). This book, written for 
medical students, contains summaries of the relevant 
laws of physics and chemistry in their relation to basal 
metabolism and chapters on nearly all phases of this 
subject. It may be highly recommended to the begin- 
ner on account of its wide scope, profuse references, 
and scholarly organization. One of the best brief pres- 
entations in English of the theory and practice of basal 
metabolism will be found in the first two chapters of 
King’s Basal Metabolism (321). There is a bibli- 
ography of about three hundred references, most of 
which arc recent and valuable. Probably the most 
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complete manual ia German is that of Grafe (22S), 
and a useful French manual is that of Terroinc and 
Zunz (SS8). 

Many of the Carnegie Institute of Washington pub- 
lications carry useful digests and discussions of various 
phases of basal metabolism. Benedict and Carpenter 
(70), for example, discuss the significance of various 
factors in the measurement of basal metabolism, and 
Harris and Benedict (251) give a more technical ac- 
count of a biometric study of basal metabolism in man. 

Very interesting accounts of the history of basal meta- 
bolism, fundamental concepts underlying it, its meas- 
urement and the various factors influencing it have also 
been written by Benedict in 1925 and 1928 (56, 63). 
The historical point of view is represented also by 
Lusk (391) and Thannhauser (559). Lusk's history 
of metabolism runs from Socrates and Hippocrates to 
Rubner and Zuntz and is a document full of human in- 
terest. A very complete summary of the experimental 
work on basal metabolism was prepared by Boothby 
and Sandiford (102) in 1924-. Their discussion is 
based upon a bibliography of 697 titles, Talbot (555) 
has summarized the literature on basal metabolism of 
children, with a bibliography of 169 titles. Lusk (390, 
392) discusses early work on developing the law of sur- 
face area. Miscellaneous references of lesser import- 
ance are those of Benedict (52), Kahn (314), and 
Forges (472). 

General Metabolic Exchange and Energy Meiahol- 
istn. The student who is unable to read German will 
find the majority of reference manuals on this subject 
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closed books to him, A study of the appropriate sec- 
tions of the references quoted under the heading of 
basal metabolism, however, may make it possible to 
dispense with the assistance of the more general man- 
uals for most practical purposes. The leading manuals 
and texts are listed in chronological order below. 

The textbook by Tigerstedt (560) will be found to 
be quite rich in references earlier than 1905. The ref- 
erence to Magnus-Levy (397) is available in English 
translation and hence may be read by the general reader 
interested in the early development of the physiology 
of metabolism. Johannson (308) contributes pictorial 
illustrations of the various types of respiration appara- 
tus available in 1910. Lefevre (373) has written a very 
broad and inclusive treatment of the whole field of 
metabolism, direct and indirect calorimetry, heat reg- 
ulation and other related physiological topics. It is 
still of considerable value as a reference. The article 
by Murlin (440) carries descriptions of all the stand- 
ard methods and a good discussion of theory and 
practice. The most serious limitation of his contribu- 
tion is that there are no specific references, although 
the article is filled with quotations from many workers. 
Grafe’s (225) manual treats all phases of general 
metabolism in a comprehensive way and with adequate 
reference to the literature. It may be highly recom- 
tnended to those to whom it is available. The 
Abderhalden (3) handbook is not so useful for our 
purposes as its title might imply. Only one article in 
the collection Is of interest in studies of respiratory 
metabolism during work, and this one, Johannson 
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(309) is not primarily concerned with gaseous ex- 
change, but considers it only in its relation to general 
metabolism. The review by McCann (412) is of more 
value to the physician than to workers in industry. 
There is some discussion, nevertheless, of the cost of 
work and the effect of miscellaneous factors on meta- 
bolism. There is a bibliography of over three hundred 
titles. Volume IV, Part 10, of the Abdcrhalden (2) 
series consists of an exhaustive treatment of gaseous 
metabolism and calorimetry in the form of a sympo- 
sium by about twenty leading authors in these fields. 
It comprises a complete working manual on all types 
of apparatus and methods. Atzler’s contribution to 
the symposium JCor/ier und Arheil (18) outlines the 
general theory of metabolism and its measurement. The 
reference to Caccuri (118) was not available to the re- 
viewer. One of the most modern manuals of practical 
scope is that of Knipping and Rona (343), The sec- 
tions on energy metabolism, pages 98-195, and work 
metabolism, pages 223-240, constitute a clear descrip- 
tion of modern techniques, well organized and 
illustrated with splendid cuts and diagrams. The best 
modern methods of both direct and indirect calorimetry 
are discussed, and the book includes many useful points 
of practical technique. The textbook by Krauss (348) 
is another comprehensive treatment of indirect calori- 
metry by both open and closed circuit methods. Krauss 
describes and illustrates a wide variety of experimental 
methods but is not much concerned with work 
experiments. There are several useful tables and a 
bibliography, pages 305-312, of over one hundred titles 
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on gaseous exchange, chronologically arranged from 
Lavosier to date. In this section should also be men- 
tioned the review by Murlin (442) of metabolism in 
infancy and childhood. This article treats its subject 
broadly and is supported by a bibliography of over 
two hundred titles but is not of much interest in con- 
nection with work metabolism in spite of the general 
introductory material included. The majority of the 
above references are of a theoretical or general nature; 
references more specifically concerned with apparatus 
and methods will be supplied in a later section. 

Respirafory Physiology and Blood Chemistry in 
Relation to Muscular Work, It is essential to know 
something of the physiology of internal and external 
respiration and their relation to blood chemistry and 
pulmonary ventilation respectively in order to under- 
stand the technical material on the biochemistry and 
dynamics of muscle action and the effects of muscular 
work on the body as a whole. Although the majority 
of investigators will need only a few specific references 
on this subject in addition to the material already 
covered in the general references mentioned in preced- 
ing sections, it is desirable for all experimenters to be 
familiar with a few of the general references and in 
certain instances more specific material may be needed. 
It should be understood, then, that all of the references 
given below are not recommended for general reading. 

Newcomers in this field are fortunate in having 
available Haldane’s (243) splendid historical treat- 
ment of the scientific investigation of the problem of 
respiration. This book provides the best possible sub- 
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stitute for the experience of having worked many years 
in the field of respiration and the opportunity of foh 
lowing its development discovery by discovery. The 
volume is not always easy reading, although limited to 
elementary mathematics and chemistry. There arc 
chapters on such subjects as carbon dioxide and the 
regulation of breathing, the nervous control of breath- 
ing, the blood as a carrier of oxygen, the effects of the 
want of oxygen, blood reaction and breathing, blood 
circulation and breathing, and the effects of various 
factors upon respiration. There is an appendix on 
blood chemistry. 

A, V. Hill is well known not only for his scientific 
contributions to the theory of muscular exercise, but 
also for bis semi-popular works on muscular activity 
(281-284). These references will be considered more 
in detail in a later section and arc mentioned here only 
because the breadth of their scope makes them appro- 
priate. A somewhat earlier reference which remains 
standard is that of Krogh (354). All phases of the sub- 
ject of respiratory exchange of animals and men are dis- 
cussed and there is a well-organized and extensive bibli- 
ography to 1916. Reference is made here to the early 
work of Hanriot and Ricliet (247) principally as a mat- 
ter of record and classification. The reference will be 
considered again in connection with special methods in 
work experiments. 

Gesell (212) has written a review of the current the- 
ories of the chemical regulation of respiration, with em- 
phasis on his own view. The article is concise and well 
written and has a bibliography of 141 titles. He wrote 
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a similar article the next year in Science (211), ex- 
pressed in more popular terminology. Scott (509) may 
be taken as an example of technical research of the type 
reviewed by Gcsell. Scott’s experiments seemed to de- 
monstrate that "undissociated carbon dioxide acts as 
a specific respiratory hormone. Therefore the physi- 
ological effects of carbon dioxide on respiration can- 
not be attributed solely to its acid properties when in 
solution" (opposed to the view that carbon dioxide ex- 
cites the respiratory center only through its action on 
the hydrogen ion concentration of the blood, to which 
the body makes adjustment). See also Bald (31) for 
a discussion of the regulation of respiration in terms of 
blood chemistry. 

The first five chapters of Henderson and Haggard 
(266) constitute a useful exposition of modern respira- 
tory physiology, This may be recommended to all 
workers. DuBois-Reymond (99) contributes an ar- 
ticle on methods of studying the mechanics of respira- 
tion. Takahira (553) is quoted by DuBois in Basal 
Metabolism, page 21, as a good general review of the 
subject of respiratory metabolism. This was not avail- 
able in September, 1928, in the John Crerar Library, 
the University of Chicago Library, or the Library of 
Congress, Washington. The reference to Brunton 
(115) consists of a collection of clinical and experi- 
mental papers of a minor nature, most of which were 
originally published in the eighties. Campbell, Doug- 
las, and Hobson (121), in discussing their experiment 
mi the respiratory exchange of man during and after 
muscular exercise, presented an extended account of 
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the physiology of the various processes involved. They 
were concerned particularly with the rise iti the res- 
piratory quotient immediately following work and the 
reasons for various changes in gaseous metabolism. 
This reference will be discussed later in connection with 
muscular work. In the article by Krogh and Lindhard 
(356) will be found a theoretical discussion of the 
mechanism which provides the very rapid adaptation 
of the respiratory and circulatory systems to sudden 
muscular exertions. 

Aitken and CJark-Kcnnedy (6) divided a single ex- 
pired breath into six successive portions in their study 
of the fluctuations in the composition of alveolar air 
during the respiratory cycle In muscular exercise. A 
preliminary announcement of their apparatus, methods, 
and conclusions was written in 1927 (S) in which it 
was stated that “during the tenth minute of moderate 
work on a bicycle ergometcr a single breath, two to 
three and one-half liters, is divided up into six succes- 
sive portions by means of a special apparatus, The car- 
bon dioxide concentrations in the six portions arc 
plotted against the respective volumes, a smooth curve 
being drawn. . They found a typical S-shaped curve 
from the origin to about 1 100 cc., followed by a straight 
portion sloping gently upward. The average carbon 
dioxide percentage is lower than that of the last 10 cc. 
alone.. Briggs (112) develops his conception of physi- 
ological overload in some detail in a study on physical 
exertlonj fitness, and breathing. "When exertion of 
steady, increasing magnitude is undertaken the expired 
carbon dioxide percentage first rises and then falls." 
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He considers that this marks the point beyond which 
there is physiological overload. 

Of practical irtiportance in the selection of measure- 
ment techniques is the assurance from the experimental 
results of Periera (463) that oxygen consumption when 
one is breathing pure oxygen differs practically not at 
all from the consumption when breathing atmospheric 
air. The validity of this assertion has been attested to 
by a large number of investigators. Padget (456) has 
shown that inspiration of carbon dioxide causes an 
immediate rise in the carbon dioxide tension of arterial 
blood, but a lag in other responses, perhaps due to time 
required to saturate the tissues, especially the respira- 
tory center, The reaction of any one individual to car- 
bon dioxide is extremely constant but it varies greatly 
among different individuals. Another experiment on 
the effect of breathing different concentrations of car- 
bon dioxide is reported by Goldstein and DuBois 
(221). Means (422) has ivritten an extensive review 
of the many phases of the subject of dyspnoea. His 
bibliography and treatment of general subjects may be 
of some interest to investigators in work metabolism. 
Perwitzschky (464) has investigated the temperature 
and moisture of air in air passages under normal en- 
vironrhental conditions and normal depth of breathing 
in resting man. Jahn (304) has studied the relation 
between oxygen consumption and carbon dioxide con- 
centration in an experiment on the specific dynamic 
action of food and the laws of gaseous exchange. Doug- 
las and Haldane (168) studied the capacity of the air 
passages under varying physiological conditions. Their 
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data on the results of experiments on walking, given 
originally in tabular form, were plotted by Page (458), 
who was interested in demonstrating the sensitivity of 
ventilation rate alone as an index of energy metabolism. 

Ponzo (470, 471) discusses psychological influ- 
ences upon the rate and character of breathing. There 
is momentary suspension of breathing during close at- 
tention, acceleration and retardation accompanying 
thoughts with different affective tones, and disturbance 
due to slight laryngeal movements during thinking. 
Specific examples are given, such as slower respiration 
with work at scientific instruments and during reading 
and other mental work. 

Mathieu and Schaeffer (409) report briefly on ex- 
periments indicating an bwersc relation between car- 
bon dioxide concentration and respiratory frequency. 
It was possible to establish certain relations of a de- 
finite sort when the averages of groups were taken. Of 
considerable importance in connection with the Waller 
method of estimating energy consumption (to be 
discussed in a later section) is the article by King and 
Cross (323) on superventilation and carbon dioxide 
elimination. The reference will be discussed in further 
detail in connection with the theoretical definitions of 
the Waller method. Durig (179) reports experiments 
on gaseous metabolism performed in connection with 
the Mount Rosa expedition. Useful theoretical dis- 
cussions of respiratory exchange and respiratory phys- 
iology will be found in several of Benedict's studies 
of which Numbers 54, 58, 70, and 82 (of our biblio- 
graphy) may be taken as examples. 
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It is difficult, and perhaps not highly essential, to 
separate the discussion of blood chemistry in relation 
to muscular work from the consideration of respiratory 
physiology in the same connection. The two topics 
have necessarily been treated together in most of the 
references given above so that the student should now 
understand the relationship of respiratory measurement 
to the fluctuations in the composition and hydrogen ion 
concentration of the blood stream. A good deal of 
modern work is being done on direct measurements in 
blood chemistry father than the measurement of res- 
piratory exchange. For this reason we shall need to 
mention a few references specifically related to blood 
chemistry, even though our principal interest in pre- 
paring this guide to the literature is in connection with 
indirect calorimetry by means of respiratory exchange 
measurement. 

The articles by Hastings (254, 255) are useful studies 
of changes in the blood following muscular work. His 
“Physiology of Fatigue” (255) constitutes a significant 
contribution to the theory of muscular exercise. It is 
based upon a study of the hydrogen ion concentration 
and the composition of the blood of dogs working on a 
tread-mill. The articles by Barr, Himwich, and Green 
(34) are written in three parts; Part I, “The Changes 
in the Acid-Base Equilibrium Following Short Peri- 
ods of Vigorous Muscular Exercise”; Part II, “A 
Comparison of Arterial and Venous Blood Following 
Vigorous Exercise”; Part III, “The Development and 
Duration of Changes in the Acid-Base Equilibrium.” 
There arc good bibliographies with each section. The 
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contributions of Lundsgaard and Mciller (386) are also 
in three parts: Part I, "Oxygen and Carbon Dioxide 
Content of Blood Drawn from the Cubital Vein Before 
and After Exercise"; Part II, "Oxygen and Carbon 
Dioxide Content of Blood Drawn from a Cubital Vein 
at Different Intervals After Exercise"; Part III, "Ef- 
fect of Varying the Amount and Kind of Exercise." In 
the paper by Bock, Dill, Hurxthal, Lawrence, Cool- 
idge, Dailey, and Henderson (88) will be found a de- 
scription of the principal physio-chemical properties 
of the blood of a normal man in a steady state of work 
and the changes in the blood accompanying the change 
from rest to work. 

Van Slyke (567) has written a technical review of 
the chemistry of the carbon dioxide carriers of the 
blood. In terras of buffers, hydrogen ion concentration, 
etc. Of theoretical interest principally is the article 
by Schneider and Truesdcll (506). An increase in 
the carbon dioxide content of the blood in man was 
found to raise the blood pressure and increase both the 
volume and frequency of breathing. In their studies 
several types of physiological changes were noted in 
detail. Barcroft (33) has written a comprehensive 
manual on The Respiratory Function of the Blood, 

The attention of psychologists who may be using 
these bibliographic notes is drawn to the article by 
Starr (537) on the relationship of high alveolar car- 
bon dioxide tension to the etiology of stammering. This 
subject is not of interest perhaps to in4u3trial engineers, 
but is an interesting application of the study of blood 
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chemistry to a practical problem often studied by psy- 
chologists. 

The Biochemistry and Dynamics of Muscle Action. 
Most of the work on this subject has centered about the 
investigations of the British physiologist, A. V. Hill, 
and his colleagues. He has contributed extensively 
both to the literature of pure science and to semi-popu- 
lar exposition. Early articles of a reasonably non-tech- 
nical nature were written in 1924 (282) and 1925 
(283). This latter was written with characteristic 
clarity and conciseness and may be highly recom- 
mended to those wishing to read only twenty-five pages 
of theory. In 1926 (281) he published a more exten- 
sive treatment of muscle physiology with ample cita- 
tions of the literature and descriptions of recent work. 
It is written in as non-chemical and as non-mathemati- 
cal a way as it is possible to write such a book. A more 
widely-known account is his Muscular Movement in 
Man (284), published in 1927. The first half of this 
book treats of the modern physiology of muscular work 
and the last half is concerned with the viscosity of hu- 
man muscle, the dynamics of sprinting, the mechanical 
efficiency of human muscle and many matters of the- 
oretic interest to the biological chemist and to scienti- 
fically inclined athletes. The bibliography contained 
in his book, Muscular Activity, is carried up to date, 
i.e., from 1924 to 1927. 

Hill and his collaborators have produced an exten- 
sive scientific literature which can be only briefly 
sampled here. Hill and Lupton (286) present an ex- 
tended and semi-technical account of the physiology of 
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exercise in man. This work is not a report of an ex- 
periment but is a general discussion with experimental 
illustrations. The reference is valuable for theory and 
bibliography and constitutes a more complete treatment 
of some of the material found in Muscular Movemeui 
in Man, The reference to Hill, Long, and Lupton 
(28S) is one of an extensive series in the Proceedings of 
the Royal Society, Volume 96 contains Parts I to HI, 
which consist of an extended treatment of muscle phy- 
siology with bibliography and mathematical and 
chemical discussion, “Lactic acid in muscle does not, 
to any serious extent, directly turn out carbon dioxide 
from bicarbonate. It combines with sodium protein 
and raises the hydrogen ion concentration; the elimina- 
tion of carbon dioxide which results is the consequence 
of the induced activity of the respiratory system." The 
respiratory quotient was found to fluctuate widely after 
severe exercise but was not seriously affected by mild 
exercise. It was found to depend upon lactic acid, 
with its influence upon the respiratory center through 
the effect of hydrogen ion concentration. The recovery 
process is studied in detail in Parts V and VI. Careful 
distinctions are drawn between the effects of “severe” 
and “moderate" exercise, and their physiology is sep- 
arately discussed. 

Hill’s theory that muscles utilize carbohydrate only 
during work is the center of active controversy. Hill’s 
view is supported by Bock, vanCauIaert, Dill, Foiling, 
and Hurxthal (91) who find that glycogen is the prin- 
cipal immediate source of energy for muscular con- 
traction. Krogh and Lindbard (357) publish experi' 
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mental results indicating a higher mechanical efficiency 
under carbohydrate fuel, indicating that it is carbohy- 
drate that is actually utilized in the reactions and that 
fat must first be converted into carbohydrate, with con- 
sequent loss of mechanical efficiency. DuBois (174) 
also supports Hill’s views in a study of metabolism in 
disease and in health. He emphasizes the difference 
between the nature of the diet and the nature of the 
foodstuffs actually metabolized and shows the necessity 
of interpreting experimental results in the light of this 
fact. 

Himwich and Castle (292) and Himwich and Rose 
(293) have studied the respiratory quotients both of 
resting and of exercising muscle and found that in each 
case the muscle had a respiratory quotient, not of unity, 
but of practically the same order as that of the body as 
a whole. This indicates that not only carbohydrate but 
also some fat and protein foodstuffs are also consumed. 
Fcnn (190) found the respiratory quotient of frog 
muscle to be less than unity both while at rest and dur- 
ing exercise. This supports the findings of Himwich 
and Castle and Himwich and Rose. Rapport and Ralli 
(477) are also in agreement with these findings. The 
following is quoted from Physiological Abstracts, 1928, 
No. 1010: "It is suggested by the evidence that in mild 
exercise of short duration carbohydrate is not the sole 
source of energy and that fat is utilized, not to replen- 
ish carbohydrate stores, but by oxidation to supply en- 
ergy for muscular exercise. The muscles oxidize 
usually a mixture of fat and carbohydrate dependent 
upon what foodstuffs they receive, and their proper- 
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tion." Another recent contribution is that of Liiidhard 
(382), whose article is abstracted in Physiological Ab- 
siracls, 1928, No. 2181 as follows: "A re-investigation 
of the problem with precautions against interference 
with respiratory movements. Work was performed on 
a bicycle ergomcter at what has been found to be the 
optimum speed. The experimental results of A. V. 
Hill and his colleagues on this subject are subjected to 
criticism, and the author finds no support from his own 
experiments for the conclusion that muscular work of 
short duration is done entirely at the expense of car- 
bohydrate.” The subject is discussed by DuBois (172), 
page 46, who quotes Lusk (388) in opposition to Hill’s 
theory that carbohydrate alone furnishes ’the energy in 
long-continued exercise. 

The dynamics of muscle action arc treated by Hill in 
several of his contributions, both popular and techni- 
cal, Amar (10) also devotes one or two chapters to 
this subject, More technical articles are those of Furu- 
sawa, Hill, and Parkinson (204) and Hopkins (297), 

Cathcart (130) traces the reversal of the modern 
theory of pi;,otein metabolism from the older conception 
(i.e., that it is the source of energy for muscular work). 
He shows that protein is concerned in muscular work 
nevertheless. There are about eighty references in the 
bibliography, giving the author and periodical but not 
the title. 

Avery popular article on the elementary biochemis- 
try of muscle action, although expressed in modern ter- 
minology, is the article "Fatigue and Rest” (404) , con- 
tributed by Martin to Industrial Psychology. 
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The Physiology of Muscular Work. There is, of 
course, considerable difficulty in separating the discus- 
sion of this topic from the discussion of the biochemis- 
try and dynamics of muscle action, treated in the pre- 
ceding section. Some of the more general writers such 
as Hill (281-284) and Bainbridge (30) may be re- 
ferred to just as appropriately here as they were above. 
It must be obvious that these two sections deal merely 
with two aspects of one fundamental problem and that 
discussions of the one will usually be phrased in terms 
of the other. Atzler (20) has written a concise review 
of the physiology of muscular work, with about fifty 
citations to the literature. Grafe (225) also has a use- 
ful chapter on this subject with 69 titles in his bibli- 
ography, The article by Riesser (483) appears to be 
a general treatment of this subject, but was not avail- 
able to the reviewer. The Vienna Letter (578) in the 
Journal of the American Medical Association is a very 
sketchy report of Professor Rubner’s lecture on “Mod- 
ern Conceptions of the Physiology of Work." Magne 
(396) has written a good general account of respira- 
tory changes during muscular exercise, with consider- 
able tabular data quoted from other experimenters. 
Some of the contributors to Korper und Arbeit (19) 
have written articles of a general nature. Herbst (269) 
and Mangold (400) may be cited as examples. 

Some of the most thorough investigations of the phy- 
siology of muscular exercise have been carried on by 
Benedict and his colleagues at the Carnegie Institution 
of Washington. The work of Benedict and Carpenter 
(69) on the influence of muscular and mental work on 
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metabolism is classic in this field and carries a good 
summary of earlier work on mental and muscular meta- 
bolism. The work of Smith (525) is another typically 
thorough Nutrition Laboratory study. Smith found 
that physiological adaptation to work occurs in large 
part within 30 seconds and is complete within 3 min- 
utes. Recovery was found to be not quite so prompt. 
Henderson, Dill, vanCaulaert, Foiling, and Coolidge 
(262) have written a short article pointing out that the 
results of many experiments show convincingly that 
there is an analogy between the behavior of the human 
mechanism and that of a machine which adjusts itself 
to a steady state of work no less smoothly under a heavy 
load than when idling. It Is possible for the human 
machine to carry on smoothly while arterial blood, and 
therefore cell environment, remains approximately con- 
stant during heavy work. Other articles of a general 
nature are the contributions to the symposium in the In- 
ternational Labor Review (SS2) and that of Knoll 
(344) who studied internal and external respiration in 
certain of the major sports. 

Two or three studies on the relationship existing be- 
tween physical and mental work may be mentioned 
here. There is, of course, the study of Benedict and 
Carpenter quoted above, and there are several more 
recent studies of the energy cost of mental work. It is 
appropriate here to refer only to the studies of Day 
(1S7), Gillespie (215), and McDowall (419), all of 
whom were concerned with the influence of both physi- 
cal and mental work on such processes as breathing and 
circulation. These references will be discussed in 
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greater detail in the section devoted to mental work. 

Early studies in the physiology of exercise utilized 
rather simple physiological measures, including res- 
piratory exchange determinations such as we are ad- 
vocating for widespread use in industry. Investigators 
working upon theoretical problems, however, have de- 
veloped methods and apparatus of greater and greater 
refinement for various technical purposes. Higley and 
Bowen (278) describe several different early methods, 
including their own (published in 1905). Krogh 
(349) discusses the accuracy of respiratory exchange 
determinations in experiments of very short duration. 
Kaup and Grosse (317) discuss older methods and 
make new suggestions. Boigey (97) illustrates his 
graphical methods of determining respiratory exchange 
during work. Further discussion of respiratory ex- 
change determination methods and apparatus will be 
reserved for a later section. Schneider (500) proposes 
a cardiovascular rating as a measure of physical fatigue 
and efficiency. Since this rating depends upon the 
effect of exercise on pulse rate and blood pressure, it 
seems pertinent to mention it here. Henderson and 
Prince (267) discuss the oxygen pulse and the systolic 
discharge. The oxygen pulse is defined and its use ex- 
plained in connection with the physiology of exercise. 

The technical developments mentioned in the above 
paragraph are well illustrated by techniques which 
have recently been made available for determining the 
circulation rate in man. Lindhard (380) uses the 
nitrous oxide method in his study of circulation after 
cessation of work. Bock, Dill, and Talbott (89), in 
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recent work from the Fatigue Laboratary, Morgan 
Hall, Harvard University, and the Medical Labora- 
tories of the Massachusetts General Hospital, prefer the 
so-called “Haldane” method to either the nitrous oxide 
method or the ethyl iodide method. In the “Haldane” 
method the circulation rate is determined indirectly 
from the alveolar carbon dioxide tension. Bock, van- 
Caulaert, Dill, Foiling, and Hurxthal (90) continue 
the study quoted in the preceding sentence. They de- 
termiued blood flow, pulse, blood pressure, lactic acid 
in the blood, carbon dioxide dissociation curves and 
Haldane analysis of expired air in a study of dynamical 
changes occurring in man at work. There is a bibli- 
ography of 28 references. Dill, Hurxthal, vanCaul- 
aert, Foiling, and Bock (161) demonstrate that the au- 
tomatic sampling device used in the determination of 
the rate of blood flow by the ethyl iodide method gives 
values much too low for the carbon dioxide pressure 
of, arterial blood, and Dill, Lawrence, Hurxthal, and 
Bock (162) show that Haldane-Priestlcy samples of 
alveolar air collected during exercise at the beginning 
of expiration measure approximately the average car- 
bon dioxide pressure of arterial blood. Another care- 
ful experiment on hydrogen ion concentration of the 
blood and alveolar carbon dioxide tension is that of Ar- 
borelius and Liljestrand (14), Hough (299) describes 
an improvement in the Haldane method of collecting 
samples of alveolar air. This is in an early study of 
the physiology of muscular exercise. Goiffon (218) 
describes a recent device intended to simplify and stan- 
dardize the taking of samples of alveolar air. A dis- 
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cLission of the determination of the circulation rate in. 
man at rest and in work has been written by Boothbv 
( 100 ). 

The technical studies of the physiological adjust- 
ment of the human body to muscular work are entirely 
too numerous to permit more than a casual sampling 
here. Cook and Pembrey (144) and MacKeith, Pern- 
brey, el al. (395) have conducted elaborate studies of 
the alveolar carbon dioxide tension, acidity of urine, 
circulation rate, and similar factors in heavy exercise 
in which “second wind” is ordinarily experienced. 
Cook and Pembrey report that “second wind appears 
to be an adjustment of the circulatory and respiratory 
systems to the demands of the muscles for an adequate 
supply of blood. Carbon dioxide is the chief factor 
in affecting the accommodation.” In the latter article 
by MacKeith, Pembrey, el al. this idea is amplified and 
the adjustment explained principally in terms of dis- 
turbance and re-establishment of the acid-base equili- 
brium of the body. Similar studies of adjustment of 
the human body to muscular work are those of Talbott, 
Foiling, Henderson, Dill, Edwards, and Berggren 
(556), Barr, Himwich, and Green (34), Lundsgaard 
and Moiler (386), Herxheimer and Kost (272), and 
Gordon (222) . Other technical studies of interest here 
are those of Martin and Gruber (406), Douglas (166), 
and Long (385) . 

Parts I and III of the series by Simonson (519, 521 ) 
are similar in nature to the studies listed above. Part 
IX of the series by Viale (575, 577) may also, be 
taken as an example of this type of investigation. Other 
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Special studies on the correlation between respiration, 
circulation, and oxygen consumption during muscular 
work are those of Mobitz (427), Ledent (369), Cas- 
sinis (128), and Boigey (27), 

Studies in which the principal emphasis was placed 
upon the recovery period following muscular work arc 
those of Sargent (497), Campbell, Douglas, and Hob- 
son (121), Condero (142), Herxheimer, Wissing, and 
Wolff (273), and Liebenow (376) . Sargent found that 
recovery was extremely rapid, particularly for the first 
ten minutes after exercise. The rate was found to vary 
with the subject and the severity and duration of ex- 
ercise, however, so that total oxygen consumption dur- 
ing recovery cannot be estimated by applying a fixed 
correction to observed partial recovery. An extended 
and thorough-going account of the physiology con- 
cerned and especially of the rise in the respiratory quo- 
tient will be found in the reference by Campbell, Doug- 
^las, and Hobson. 

Studies having as an Important aim the determina- 
tion of the nature of the foodstuffs concerned in muscu- 
lar work are those of Furusawa (201), Henderson and 
Haggard (26S), and Marsh (402). These studies will 
be considered again in connection with our discussion 
of the respiratory quotient and so will not be abstracted 
further here. The work of Curtis ( Hi ) may be taken 
as an example of a study of the influence of the thyroid 
gland on v^rorking metabolism. Simonson (518) has 
studied the effect of forced breathing on recovery from 
rhuscular work. 

The following studies may be mentioned as being of 
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interest in a practical way as well as in their theoretical 
contributions: Benedict and Cathcart (72), a study of 
the efficiency of the human body as a machine; Atzler, 
Herbst, Lehmann, and Muller (24), studies of various 
methods and postures used in lifting weights; Atzler, 
Betke, Lehmann, Sachsenberg, et al, (21), studies on 
work and fatigue; Atzler (19), mentioned previously 
as a useful symposium; Hansen (248), a determination 
of optimum speed of work, on a bicycle ergometer ; Hill 
and Campbell (289), a study of the effect of atmo- 
spheric cooling upon efficiency during work, and Wal- 
ther (605), on the “techno- psychology” of work (not 
available to the reviewer) . 

The Respiratory Quotient. The respiratory quotient 
has already been discussed in connection with Hill’s 
viev/ that carbohydrate is the source of energy in mus- 
cular work. Several studies supporting this view were 
cited as well as a number in opposition to it. It is es- 
sential now to consider the matter in more detail as it 
is necessary to understand the use of this ratio as an in- 
dex of the energy value of the oxygen consumed in work 
experiments. These matters are amply discussed in 
a number of manuals listed in a preceding section, for 
example, Chapter IV of King (321) and several chap- 
ters in DuBois (172) . Other general discussions of the 
significance of the respiratory quotient during and fol- 
lowing muscular work are found in Amar (7) and in 
Campbell, Douglas, and Hobson (121). 

The most important recent developments have been 
in the direction of demonstrating the great complexity 
of the factors which interact to produce the respiratory 
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quotient as determined at any instant. Cathcart and 
Markowitz (135) show that the ratio existing between 
the volumes of carbon dioxide and oxygen represent, 
not a single, relatively simple physiological phenome- 
non, but that it is the sum of an infinitely large, rela- 
tively unknown series of phenomena. Conybeare and 
Pembrey (143) also contend that the respiratory quo- 
tient is a resultant of many factors and that the older 
view that it is composed only of varying contributions 
of the three classes of foods (R.Q. 0.7 for fat, R.Q. 0.8 
for protein, and R.Q. 1.0 for carbohydrate) is no longer 
tenable. The component above R.Q. l.O can be ex- 
plained by the conversion of carbohydrate into fat and 
the component below 0.7 can be explained by the con- 
version of fat into carbohydrate. Fries ( 19B) finds that 
the respiratory quotient fluctuates widely throughout 
the day and that the average of a few short period tests 
does not always represent the daily respiratory quo- 
tient. Furusawa (201) showed that it is impossible to 
determine from the respiratory quotient what substance 
is being oxidized in the muscle itself during exercise. 
Hendry, Carpenter, and Emmes (268) declare that 
when basal metabolism only is desired it is not essential 
to determine the respiratory quotient but that it must 
never be neglected when work of a scientific nature is 
being done. Knipping (331) , on the other hand, insists 
upon the necessity for determining the respiratory quo- 
tient in routine basal metabolism tests as well as in all 
other experimental investigations. 

Benedict, Emmes, and Riche have been quoted by 
DuBois (172) as showing an R.Q. of .88 after a meal 
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rich in carbohydrate as compared with .82 as a nor- 
mal basal. Many experiments since that date have in- 
dicated an effect of the preceding diet on the respira- 
tory quotient even after digestion has ceased, and 
Marsh (403) in a recent study showed that “the res- 
piratory quotient for the excess metabolism in moderate 
work is more constant if the diet is controlled than 
otherwise. On mixed diet it is .95, on carbohydrate 
it rises and on fat diet it is .83-. 80. The net efficiency 
of a subject first on mixed, then on carbohydrate and 
lastly on fat diet decreased during the fat diet slowly up 
to the eleventh day, when it fell markedly.” The above 
is quoted from Physiological Abstracts, Volume 12, 
No. 3890. 

A short article by DuBois (173) is available, show- 
ing graphically the respiratory quotient and the per- 
centage of calories furnished by protein, fat, and car- 
bohydrate. Smart (524) reproduces a slide rule for 
the calculation of the respiratory quotient and supplies 
the mathematics of its construction. Further details 
of the use of the respiratory quotient in determining 
energy consumption will be given in the next section. 

Hill's contention that glycogen supplies the energy of 
muscular work has already been discussed in another 
connection but should be more amply treated here, 
through the citation of the more significant studies in a 
more or less chronological order. Cook and Pembrey 
( 144) found the respiratory quotients of their men at 
rest ranged from .75 to 1.03, with a mean at .90, and 
that after exercise they ranged from .81 to 1.37, with a 
mean of l.OO. This study, which would appear to 
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support Hill’s views in part, was followed by a similar 
investigation by MacKeith, Pembrey, el ai. (39.^) about 
ten years later, Krogh and Lindhard (356) found that 
the respiratory quotient rose rapidly to or above unity 
at the beginning of heavy work, and the same authors 
(357) found a higher mechanical efficiency under car- 
bohydrate fuel. Krogh (351) has written a short sum- 
mary of this later study. DuBois (17+) also supports 
Hill’s views, and in the very recent work of Bock, van- 
Caulaert, Dill, Foiling, and Hurxthal (91) the respira- 
tory quotient was found to be fairly constant at about 
,95. This would indicate that glycogen is the princi- 
pal immediate source of energy of muscular contrac- 
tion. 

The studies of Himwich and Castle (292) and Him- 
wich and Rose (293) have already been mentioned as 
being contradictory to Hill’s view. In the same dis- 
cussion mention was also made of the work of Fcnn 
(190), Rapport and Rail! (+77), Lindhard (382), and 
Lusk (388), To these studies opposing carbohydrate 
. as the sole source of muscular energy should be added 
the early study of Morgulis (+3+) who believed that 
respiratory quotients of 1,00 and over obtained during 
muscular exercise should be interpreted as indications 
of faulty technique rather than indicating the use of 
pure carbohydrate of the transformation of glycogen 
into fat, as would be implied for a respiratory quotient 
over 1.00. He accepts Zuntz’s work on the respiratory 
quotient in which the respiratory quotient was found to 
be unaffected by muscular activity. Henderson and 
Haggard (265) also report of their experiment “that 
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the most significant result of these observations is the 
conclusive evidence which they afford that in whatever 
proportion fat and sugar are being burned during rest 
just before the exercise, they are burned in nearly the 
same proportion to produce the energy for doing work 
or for the recovery process in the muscles.” Marsh 
(402) also supports Zuntz’s view that both fat and car- 
bohydrate are used for energy in muscular work. A 
study of the mechanical efficiency of the body on car- 
bohydrate, fat, and mixed diets was made by Severing- 
haus, Reynolds, and Stark (514), who found that the 
net efficiency was the same on each diet, although there 
was ketosis under the fat diet and fatigue was produced 
during work. This work may be taken as indirect evi- 
dence that the body may use varying proportions of 
these fuels indiscriminately without the necessity of 
conversion to carbohydrate. Furthermore, Wilson, 
Levine, Rivkin, and Berliner (617) found that the res- 
piratory quotient of the extra metabolism due to ex- 
ercise was uniformly less than unity. The reference to 
Van Slyke (566) on The Relation of Carbon Dioxide 
and Oxygen is of significance principally to physicians 
and is mentioned in discussing the respiratory quotient 
only because of the apparent pertinence of the title. 

Calorie Computation. The reader who has achieved 
a good working knowledge of the physiology of mus- 
cular exercise as treated in the various references given 
above should now be in a position to consider the prob- 
lem of computing the energy exchange of an experi- 
mental period in terms of calories consumed. We have 
considered the significance of the respiratory quotient 
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and various other factors entering into the computation 
of energy exchange. There still remain the subjects of 
surface area and certain other factors playing a part in 
these computations, but these matters may be more ap- 
propriately deferred to a later section, since a super- 
ficial understanding of them is sufficient for the purpose 
at hand. 

The general manuals cited in the section on Basal 
Metabolism may be referred to for guidance in com- 
puting the energy exchange from experimental data. 
DuBois (172) and Grafe (225) may be cited again in 
this connection. Sanborn (496) also treats this topic 
briefly. Janet (305) contributes a general discussion 
of computation and supplies certain useful tables, nom- 
ograms, and formulae. His height-weight surface area 
nomogram has been reproduced by Wardlaw (607fl). 
.Arnar (10) describes the computations in work experi- 
ments briefly, , 

' The most .complete treatments of this subject arc 
found in articles primarily concerned with some of the 
open circuit methods of determining respiratory ex- 
change. The theoretical and practical differences be- 
tween open circuit and closed circuit methods will be 
treated in a later section as the computation methods in- 
volved may be understood and followed without tech- 
nical knowledge of the distinctions between these two 
broad lines of experimental approach. Henderson 
(264) appends a good set of student laboratory instruc- 
tions to his description of respiratory exchange deter- 
mination by means of typical open circuit nicthods. 
Cathcart (131) has written an elementary statement of 
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the theory of indirect calorimetry which would be con- 
sidered somewhat crude and out of date at the present 
time, but which contains condensed instructions con- 
cerning the methods of computation and carries cer- 
tain useful conversion tables as well. Dautreband and 
Davies (155), Labbe and Stevenin (359), and Booth- 
by and Sandiford (104) also describe open circuit 
methods and illustrate the computations involved. 
Klein and Steuber (325), in their discussion of various 
chemical absorption methods for determining oxygen 
and carbon dioxide, treat methods of calculation in de- 
tail and supply several useful tables. The reference 
to Wheeler (611) on Measuring the Eitiergy Cost of 
Work is disappointing in this connection as it consists 
of an inaccurate and misleading description of the 
Douglas Bag method with detailed quotations from two 
studies by Langworthy and Barott, with no acknowl- 
edgment made to the original investigators. 

The computations of energy expenditure from re- 
sults obtained by means of closed circuit apparatus have 
been treated by Benedict and Tompkins (82) and others 
who will be mentioned when we consider closed cir- 
cuit methods in greater detail. 

Stoner (543, 544) describes a simplified data blank 
and simplified calculations for use with open circuit 
apparatus, Newcomer (447) has prepared tables and 
charts by which the basal metabolic rate may be cal- 
culated by simply adding five numbers. 

Several types of nomographic charts are available. 
Boothby and Sandiford (106) discuss the mathematics 
of constructing such charts and publish a variety of 
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them, as well as correction and reduction tables and a 
sample of the Mayo metabolic blank. "A series of 
nomographic charts is described by which the calcu- 
lation of basal metabolic rate by the gasometer method 
can be made graphically in Jess than five minutes with- 
out the use of logarithms.” Smith and Smith (527) 
also publish graphs for use in determining basal meta- 
bolic rates. Dill and Foiling ( 160) say of their nomo- 
graphic charts that *‘one joins with a thread the per- 
centage of carbon dioxide and of oxygen found in the 
expired air. The respiratory quotient and percentage 
of oxygen used can be read directly with an error of 
less than one in five hundred.” Kommerell (3+5) also 
publishes nomographic charts for use with the Doug- 
las Bag method. 

Hollingsworth (295) makes a plea for wider use of 
the slide rule by physicians and describes a method of 
determining basal metabolic rate on a circular slide 
rule. Smart (524) also reproduces a slide rule for the 
calculation of respiratory quotients and supplies the 
mathematics of its construction. 

Useful tables to be used in the computations have 
been supplied by a number of authors, some of whom 
have already been mentioned. Haldane (242) supplies 
conversion tables of various kinds; Knipping and Kow- 
itz (342) supply several useful tables, including one 
giving the logarithms for reduction to 0’ C and 760 mm. 
pressure; Gauss (208) publishes a table combining the 
corrections for barometric pressure, room temperature, 
brass scale expansion, and vapor tension; Carpenter 
(125) assembles tables for reduction to 0* C and 760 
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mm. pressure, for estimating body surface, the latest 
basal standards, and factors for converting various units 
of energy from the one to the other. Roth (485) and 
Krauss (348) also publish tables which are useful in 
metabolic rate determination. 

Other charts and tables of some interest here are 
those of DuBois (173) and the Royal Society of Lon- 
don (492). DuBois says of his chart that “almost all 
the phenomena of respiratory metabolism can be rep- 
resented on this metabolic map and we can follow the 
changes which result from the ingestion of protein, fat 
or carbohydrate.” The material supplied by the Royal 
Society of London is concerned particularly with food 
requirements under various conditions in terms of calo- 
ries. 

One other reference of value here is that of Gephart, 
DuBois, and Lusk (210) who contend that in meta- 
bolic work the analytical error is seldom much less than 
one per cent and, since a variation of one per cent is of 
little significance, it is unnecessary to publish more than 
three significant figures in the data of metabolism ex- 
periments. 

Factors Affecting Metabolism, We have already 
seen that the total metabolism consists essentially of the 
individual’s basal metabolism plus the metabolism of 
the muscular exercise in which he engages. If there 
were no other factors than these influencing the meta- 
bolic rate it would be a relatively simple matter to de- 
termine the energy requirement for any specific type of 
muscular activity by determining the total metabolism 
while the individual is engaged in this activity and de- 
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ducting from that total the basal metabolism, previ- 
ously determined. Actually, however, the metabolic 
rate is influenced by a large number of factors both 
physiological and environmental, so that it is impos- 
sible to conduct significant experimentation in work 
metabolism without knowledge of the possible influence 
of these factors in order that they may be either con- 
trolled or observed in work experiments. It is im- 
possible to classify all of these factors in tnetabolisra 
in any rigid sort of way, but as a matter of convenience 
we shall group them under three principal heads. The 
first group will consist of those somatic and intra- 
, organic factors which are usually subject to observa- 
tion but which are beyond the control of the 
experimenter. The second group are somatic and 
intra-organic factors usually subject both to observa- 
tion and control, and the third group will comprise 
those factors which may be thought of as environmental. 

The most important of the first group of somatic 
factors, whether one is computing basal metabolic rates 
or is engaged in studies of working metabolism, is that 
of the weight and surface area of the body. This has 
already been mentioned in connection with calorie com- 
putation but will be treated in more detail here. 

In addition to the information in the various man- 
uals which have been repeatedly referred to, we may 
cite, the recent comparison which Stoner (S45) has 
made of the various formulae. The Dryer standards 
are recommended but the point is made in his article 
that the choice of standards in the determination of 
basal metabolic rates is coming to be of decreasing 
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significance since such determinations are now made 
for purposes of therapeusis rather than diagnosis. The 
DuBois formula (175) seems to be rather generally 
accepted and tables of values of the DuBois surface 
area formula have been worked out by Stoner (54,2) 
in a convenient form for reference. Bradfield (111), 
as abstracted in Physiological Abstracts, Volume 13, 
No. 603, finds that “the average basal metabolism for 
women is six per cent below that predicted by all stan- 
dards and is the same as that predicted by Krogh. 
Krogh’s modification of the Aub-DuBois standards 
gives correct results for women. If the DuBois height- 
weight formula is used a correction of plus two per cent 
should be made.” 

Other general discussions and summaries of the work 
on surface area have been written by McCann (412), 
Murlin (441), Krogh (350), Hedon (258), and Booth- 
by and Sandiford (103). Boothby and Sandiford 
(105) have also presented metabolism data on the sub- 
jects going through their laboratory over a period of 
about five years, They concluded that the DuBois 
standards were the best available at that time. 

Those interested in early controversies concerning the 
effect of body surface and weight upon heat production 
may note that Benedict and Smith (81) wrote in 1915 
that “it would thus appear that the increase in the meta- 
bolism noted with athletes points strongly towards the 
earlier conception that the catabolism of the body is 
proportional not to the surface of the body but to the 
active mass of protoplasmic tissue.” The question 
of the influence of the , active protoplasmic mass has 
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been debated for many years, but Corlette (14S), as 
well as most modern workers, finds that the active pro- 
toplasmic tissue is not the controlling factor in basal 
metabolism. Kaup and Grosse (315) have written a 
careful survey on the literature on surface area in which 
the American literature on the subject is criticized as 
being based upon cases with too little homogeneity. 
They report their own study on subjects of uniform 
physical characteristics and they minimize the effect of 
body size and weight which they regard as minor vari- 
ables. They find in Darwin and Lamarck a basis for 
a broader treatment of the subject in terms of the pre- 
servation of species. 

Sex and age are other somatic factors beyond the con- 
trol of the experimenter. Benedict (61) has written a 
general article on age and basal metabolism, and Mur- 
lin (442) and Talbot (555) have written general re- 
views of metabolism in infancy and childliood. 
Gottsche (223) finds a specific reaction in puberty, and 
Fleming (192) suggests that the high basal metabolism 
of the growing child may be partly accounted for by 
the energy expended in the manufacture of new tissue. 
Investigators in industry are not likely to find age a 
highly important variable among the subjects ordi- 
narily dealt with. 

Another non-controllable organic factor which has 
attracted considerable attention in recent years is that 
of the influence of race. These studies have related 
principally to racial differences in basal metabolism 
rather than being concerned with possible differences 
in the energy cost of equivalent tasks performed by per- 
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sons of different races. In general, it appears that the 
basal metabolic rate of northern groups is in excess 
of that of residents of more tropical climates. The dif- 
ferences are not large and may be obscured by the rela- 
tively wider range of individual differences. Recent 
experimental determinations of basal metabolic rate 
among various races are those of Benedict (57), Turn- 
er (565), Heinbecker (259), Steggerda and Benedict 
(538), and Williams and Benedict (613). The matter 
of individual differences in basal metabolic rate is con- 
sidered in most of the general manuals on the subject. 
Knoll (344) has studied individual differences in the 
consumption of oxygen at work. 

From the standpoint of working metabolism experi- 
mentation one of the most important physiological fac- 
tors, subject to control in some instances and report- 
able in all cases, is the effect of differences in the ex- 
tent of previous muscular training. Schneider, Clark, 
and Ring (505) have shown that such differences in the 
study of training have slight effect upon the basal meta- 
bolic rate, although there is a large number of studies 
indicating appreciable effect on work metabolism. 
Waller and DeDecker (591), studying carbon dioxide 
production, found indications of doubled efficiency in 
walking in the trained as compared with untrained 
state. Mague (552), in his contribution to the sympo- 
sium on industrial psychology, found that persons ac- 
customed to muscular work are able to eliminate higher 
concentrations of carbon dioxide, thereby economizing 
pulmonary ventilation. Briggs (ll3) has also noticed 
the effect of training in economizing pulmonary ven- 
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tilation through utilizing a higher percentage of oxy- 
gen from the air and excreting a higher percentage of 
carbon dioxide. He states that the normal percentage 
thay be even doubled in the case of certain subjects. 
Simonson (522) offers evidence for the conclusion that 
exercise results both in an increased ability on the part 
of the muscles to remove lactic acid and a greater pul- 
monary economy in carbon dioxide elimination, and 
Liebenow (376) states that the effect of training is in 
the direction of increasing "the restitution constant.” 
By this is meant that the oxygen debt is eliminated more 
quickly and the body reaches its normal carbon dioxide 
balance more rapidly. Hartwell and Tweedy (253) 
found only slight differences in pulmonary economy 
as between athletic and non-athletic women. Krogh 
(351) and Rosenheim (484) both speak of increases 
in mechanical efficiency as a result of practice in the 
task. 

Attempts to study the physiology of this increase in 
efficiency fiave been made by Simonson and Reisser 
(523) i an abstract of whose study is quoted from Psy- 
chological Abstracts, Volume 1, No. 1315: "Training is 
physiologically defined as an increased output ability 
obtained by exercise. Recuperation is improved by re- 
peated exercise of the function, although there is no 
reduction in the energy consurrfption with practice. 
This supports Reisser*s view of training effects as due 
to an, habituation to the toxin.” Hietanen, Nikkiiien, 
Nygssdla, and Sternberg (276) found an increase in 
efficiency during the first hour of walking on a slippery 
surface. This can probably be accounted for in terms 
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of the acquisition of skill rather than requiring a physi- 
ological explanation. Briggs (112) reports that "phy- 
sical work is found by experience to be easier for unfit 
men when oxygenated air is breathed but no such dif- 
ference is to be observed with fit men.” This was not 
confirmed in the earlier study by Hartwell and Tweedy, 
already quoted, but would be theoretically significant 
if clearly established. Other studies of interest in con- 
nection with the physiology of training are those of 
Loewy and Knoll (384), Kaup and Grosse (316), and 
Knoll (344). 

The experimenter using respiratory exchange deter- 
mination methods in industry will ordinarily need to 
have but little concern for the effect of pathological 
conditions, but it is necessary to mention here that there 
are certain endocrine disturbances, such as hyperthy- 
roidism, which may have a serious effect upon the total 
metabolism and that certain other disease states and 
atypical conditions should be observed and recorded if 
present. Bowen and Carmer (HO) found that the 
added energy consumption for the obese as compared 
with the normal is only about what would be expected 
for moving the excess weight, although Wang, Strouse, 
and Smith (607) state that the heat production was 
greater In the obese than in normal or thin subjects and 
lowest in the normal subjects. They also found the 
mechanical efficiency greatest In the normal and least 
in the obese. Benedict (62) has studied individuals of 
unusual physical configuration, and Frank and Herz- 
ger (197) have studied oxygen consumption and res- 
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piratory quotient under various conditions of nutri- 
tion and bodily health. 

Mollcr (428) has written an extensive study of basal 
metabolism in diseases of the thyroid gland and fur- 
nishes a bibliography of general nature, consisting of 
about 150 references, by author and journal. Boothby 
and Sandiford (102) report investigations indicating 
that thyroid cases require considerably more energy for 
a given piece of work than the normal. Others who 
have studied the effect of thyroid disturbance on work 
metabolism are Smith (526) and Curtis (151), The 
reference to Biedle (84) was retained in this bibli- 
ography through oversight, as it is of clinical signifi- 
cance only^ 

Peabody and Sturgis (460, 461) contribute hospital 
studies of heart disease of some significance to the stu- 
dent of respiratory metabolism. 'Langworthy and 
Barott (365) made the interesting discovery that, for 
five weeks after recovery from an attack of influenza 
of three weeks* duration, the energy expenditure per 
kilogram body weight was reduced by 4 per cent from 
the former requirement when doing the same amount 
of external work, McCann (413) found that the total 
pulmonary ventilation of five advanced tuberculosis 
cases was double that of the normal controls. 

Owen, Cope, and Hill (454) called attention to a 
case of unsatisfactory metabolism test resulting from 
leakage of air into ears and Eustachian tubes as a result 
of perforation of ear drums. Such instances are not 
likely to occur in industry frequently enough to re- 
quire special precautions. There are interesting and 
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sometimes surprising effects from other pathological 
conditions of the organism, but these will not ordinarily 
interest readers of this monograph. Those wishing 
further material concerning the effect of diabetes, vari- 
ous febrile states, epileptic contractures and various 
seizures, hynoptic rigidity, and other miscellaneous 
special conditions may be referred to the general man- 
ual by Grafe (225) . 

An important question both to the worker in basal 
metabolism and metabolism of muscular exercise is that 
of possible daily Ructuations due to undiscoverable or- 
ganic factors. Such variations, if of great magnitude, 
would seriously affect the results of all types of meta- 
bolism experiments, since the factors producing them 
would be subject neither to observation nor to control. 
Wishart (621) has written on “The Variability of 
Basal Metabolism," and this has been abstracted in 
Physiological Abstracts, Volume 12, No. 1012; “The 
day to day variability in both metabolism and respira- 
tory quotient (estimated by analysis of expired air col- 
lected in a Douglas bag) may be expressed by a coeffi- 
cient of variation of four or five, i.e. the minimum and 
maximum of a series of observations in a single indi- 
vidual may differ by as much as 30 per cent The vari- 
ability is increased if the protein in the diet varies 
considerably, and decreased if the subject adheres to a 
very strict daily routine, Perfectly normal people may 
show basal metabolic rates of 20 per cent below Du- 
Bois standard." Hafkesbring and Collett (239) found 
daily variations of about 5 per cent in both directions 
with maximum variations about double that. Kunde 
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(358) also found indications of seasonal and daily vari- 
ations. Lusk and DuBoia (393) find that a man thirty 
or forty years of age may maintain a basal metabolic 
rate during eleven years within a variation of plus or 
minus 7.6 per cent. A sedentary existence was found to 
have the effect of reducing the metabolic rate. Harris 
and Benedict (2S2) have made a careful Statistical 
study of the problem of variation in basal metabolism. 
It is interesting here to note that Hendry, Carpenter, 
and Emmes (268) found that post-absorptive oxygen 
consumption is uniform between 8:30 A.M. and 12:30 
P.M. 

Hafkesbring and Collett, quoted above, found basal 
metabolism S per cent higher in cold weather than in 
hot, and Griffith, Pucher, Brownell, Carmer, and 
Klein (233) verified this at least in part in finding oxy- 
gen consumption and pulse rate higher in winter than 
in Summer,, although carbon dioxide production showed 
no seasonal change. On the other hand, Gustafson 
and Benedict (237) report that “the average values 
for the oxygen consumption strongly suggest that the 
metabolism tends to be at a low level in the winter and 
to rise to a higher level during the spring and sum- 
mer.” Rowe and Eakin (491) have accumulated evi- 
dence that there may be a metabolic curve both in men 
and in women associated with some sort of sexual gland 
cycle, Lindhard (381) links seasonal periodicity in 
respiratory exchange to the varying intensity of the 
sunlight; , 

Another organic factor which might, at first thought, 
be supposed to play an important part in total meta- 
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bolism is that of menstruation. It appears that the in- 
fluence of menstruation upon metabolism has been the 
subject of controversy and experimental disagreement 
for many years. The most definite conclusion one can 
reach from studying the literature is that, whatever 
differences there may be in metabolic rate brought 
about through menstruation, such differences are 
within plus or minus S per cent and may probably be 
considered negligible in ordinary industrial experi- 
ments. Kunde (358) and Hafkesbring and Collett 
(239) found a slight lowering in basal metabolic rate 
during the first few days of menstruation, and Rowe 
and Eakin (491) found a metabolic rise in the pre- 
menstrual week, while Blunt and Dye (87), in an ex- 
tensive study, found no elevation in basal metabolism 
either before or during menstruation. King (321) 
quotes several late studies and regards the question of 
the influence of menstruation on metabolic rate as still 
being open in 1924. DuBois (172)' also finds the ques- 
tion still open in 1927- Benedict and Finn (76) re- 
view a literature of about twenty titles in 1928, how- 
ever, and report indications in their own experiments of 
a slight lowering in metabolism during the menstrual 
period, This investigation included one study of 
twenty subjects and two studies of one subject each. 

The investigator in industry is, of course, interested 
in these studies of the influence of menstruation on basal 
metabolic rate only in their indirect bearing on the 
question of whether physical work proceeds at an un- 
due energy cost during this period. Wiltshire (618) 
measured oxygen consumption and carbon dioxide out- 
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put during light work and found that the cost of work 
and rate of recovery were the same during menstrual 
and inter-menstrual periods, Bedale (43), in a recent 
intensive study of one subject, found differences in the 
energy cost of work during and between menstrual 
periods, although these differences were slight. The 
differences, surprisingly enough, were in the direction 
of increased efficiency during menstruation, “There 
seems no reason to think that the fundamental physio- 
logical rhythm of women is such as to affect, either 
considerably or constantly, the quantity or quality of 
their industrial work provided always that no patho- 
logical conditions arc present.” This study is supple- 
mented by a bibliography of fifty references, with a 
review of the work done on the problem to date. 

Physiological Factors Subject to Experimental Con- 
trol. There remains now for discussion a large group 
of organic factors which are usually subject not merely 
to observation by the experimenter but are also more or 
less under his control, The necessity for controlling 
certain of these factors in metabolic experiments is uni- 
versally recognized, although certain others are of 
minor importance and have not received wide attention. 
Most of the experimental work which has been done 
has been performed with a view toward determining 
the effect these factors have upon basal metabolic rate 
ratherthan possible influences upon the energy cost of 
muscular work. The experimenter who is interested 
in the latter must therefore rely partly upon indirect 
evidence from experiments of the former type. 

Wishart (619), for example, has studied the influ- 
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ence of previous muscular activity and other factors 
upon the basal metabolism, but this does not tell us 
what effect previous muscular activity has upon the en- 
ergy cost of work. His study is described in Physio- 
logical Abstracts, Volume 12, No. 1012; “The effect 
of an hour’s moderately severe work on the previous 
day is to raise the basal metabolic rate one to two per 
cent and to lower slightly the respiratory quotient; 
these differences are so small as to be entirely obscured 
in only occasional observations.” 

The influence of sleep on basal metabolism in chil- 
dren has been studied by Wang and Kern (606), who 
found a drop in heat production ranging from 5.7 per 
cent to 30.6 per cent. Other studies of the effect of 
various degrees of muscular tension and relaxation will 
be discussed in a later section. 

The effect of loss of sleep has been studied by Laird 
and Wheeler (361) and Landis (362). Laird and 
Wheeler, using the Douglas Bag method, reported an 
increase in the energy cost of mental work following 
loss of sleep but found no effect on errors and even 
found an increase in rate. Three subjects were given 
practice in mental multiplication for several weeks, 
or until they had reached their apparent limit of prac- 
tice. Observations were then made for one week dur- 
ing which subjects were given eight hours sleep, and 
this was followed by observation for one week in which 
the subjects slept only six hours. There was, unfor- 
tunately, no return to the eight-hour sleeping schedule 
nor were adequate controls run over the period to per- 
mit complete isolation of the cost of mental work fac- 
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tor. As 'will be seen in the section on Menial Work, it 
is dubious 'whether reliable conclusions may be drawn 
from an experiment in which the cost of mental work 
is estimated by means of respiratory exchange in such 
short experimental periods; particularly is this so when 
the sampling occurs only after the close of the mental 
work. 

The Influence of various states of under-nutrition and 
fasting has been repeatedly investigated and the litera- 
ture has been thoroughly covered by Morgulis (433), 
whose monograph carries a bibliography of more than 
one thousand references with complete titles. There 
are good author and subject indexes and the volume is 
written with strict adherence to the experimental ap- 
proach. There are liberal quotations of data and 
charts of experimental results. Grafe (225) also sup- 
plies 78 titles on hunger ns well as extensive bibliogra- 
phies on other controllable factors. A classic experi- 
ment in. this field is that of Benedict, Miles, Roth, and 
Smith (78). This and other related Nutrition Lab- 
oratory studies are discussed by Benedict (53) in a 
lecture on the social and economic implications of the 
ability to resist fasting. Kunde (358) found a tempo- 
rary increase in the basal metabolic rate in prolonged 
fasting, and Landis (362) reports upon the metabolic 
effects of fasting in his study of the emotions. 

The disturbances of metabolic rate caused by the 
taking of food has not always been eliminated from 
experimental , investigations. Certain early experi- 
ments ^ such as that of Carpenter and Benedict (126), 
were greatly weakened in the significance of their con- 
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elusions through lack of having controlled the effect 
of food. Benedict and Murchhauser (79) reported in 
1915 that ‘‘the heat production per unit of "work is prac- 
tically independent of the taking of food.” This view 
has been seriously modified in recent years, so that 
Cathcart and Orr (136), for example, report that, al- 
though they had considered the food-taking element in 
their experiment as being rigidly controlled, in search- 
ing for the cause of what appeared to be errors in tech- 
nique they discovered that the discrepancies were due 
to the practice of some of their subjects of partaking of 
midnight lunches. 

Higgins (277) in 1913 demonstrated an increased 
alveolar carbon dioxide tension throughout the dura- 
tion of the digestive processes and many stddies before 
and since have corroborated the fact that digestion in 
itself does increase the metabolic rate. The effect of 
digestion on metabolic rate, whether the body is at 
rest or at work, depends in large measure upon the na- 
ture of foodstuffs concerned, Orr and Kinloch (452) 
summarize the earlier experiments and report an ex- 
periment of their own employing the Douglas-H.aldane 
method. Their conclusions were; {a) Following a 
high protein meal the increase due to work is greater 
than in the preceding post-absorptive state, (&) Fol- 
lowing a high carbohydrate meal the increase is less 
than in the preceding post-absorptive state, (c) Fol- 
lowing a high fat meal there appears to be a summation 
of extra energy due to the food and that due to work. 
The ingestion of carbohydrate has been studied in the 
resting condition by Benedict, Emmes, and Riche (74) 
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and during work and rest by Cassinis (129). _ Viale 
(57'!', 576) has also studied energy consumption in 
human work before and after meals and during fast- 
ing. Wishart (619, 621) demonstrates in each of the 
references quoted that a high protein diet markedly in- 
creases the basal metabolic rate on the subsequent clay, 
and Cathcart and Burnett (133) show that “under the 
conditions of the experiments the differences in the 
oxygen demand during work on diets which contain 
meat and those which are meat free are definitely sig- 
nificant” (low work respiratory quotients in the lat- 
ter). General treatments of the specific dynamic ac- 
tion of food will be found in the references by Jahn 
(304) and Lusk (387). 

Experimental researches toward finding a breakfast 
which will not interfere with the determination of basal 
metabolic rates should also be of significance in cer- 
tain types of work experiments. Soderstrom, Bnrr, and 
DuBois (529) fed 30 grams of bread, 8 grams of but- 
ter, and one cup of caffeine-free coffee with 10 grams 
of sugar and 60 cc. of milk to five normal subjects pre- 
ceding the determination of their basal metabolic rate. 
A slight rise was produced in respiratory metabolism, 
which disappeared after the third hour. Benedict and 
Benedict (45) also describe a nearly non-protein meal 
which is light and non-stimulating and suitable for 
feeding to subjects before a metabolism experiment. 
Bauer and Blunt (36) conclude that basal metabolic 
rate determinations of children may be made at noon, 
provided only that the breakfast docs not exceed 420 
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calorics, including 14 grams of protein, and that it is 
eaten at least four hours before the test. 

The work that has been done on the eflfect of emo^ 
tion is still too meager to show consistent and conclusive 
results; wc know that any emotion may have an influ- 
cncc upon metabolism, but wc do not always know in 
what direction to expect the change, nor how confi- 
dently to look for it. Thus Landis (362), experiment- 
ing with iiuman subjects to discover the effects of fast- 
ing, insomnia, electrical stimulation, anger, etc., on 
metabolism, concluded that emotional disturbance, pei- 
se, does not lead to changes in the metabolic rate which 
arc always in the same direction or of the same magni- 
tude, Conversely, changes in metabolic rate cannot be 
considered as direct measures of emotional disturbance 
or cumulative emotional upset. On the other hand, 
Zcigler and Levine (622), prompted by the presence of 
abnormnJJy high metabolic rates in certain psycho- 
neurotic ;»(lividua).s, experimented upon the e/Tect of 
emotion in such patients. Their results showed that 
these psychoncurotics u.sually respond when thinking 
about an emotion-producing aspect of their past his- 
tory by an incre)i.scil metabolic rate. vSegal, Binswang- 
cr, and Strouse (i>ll). in studying the nervous symp- 
toms in exophthalmic goiter, found that individuals 
with toxic goiter not treated with iodine show the pos- 
sibility of a dangerous emotional metabolism rise. 
Totten (S6.3) dctermineil the oxygen consumption be- 
fore, during, and after a wide variety of intense emo- 
tional stimuli. lie fouiul no increase in half the cases 
studied and some increase in the others. Hafkesbring 
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and Collett (239) found that harsh or sudden noise 
produced a rise in metabolism of 10 per cent for be- 
tween S and 20 minutes. 

The possible effect of minor emotional disturbances 
has not always been realized. Early studies on mental 
and muscular work, even under such careful workers 
as Benedict and Carpenter (69, 126), were often in- 
conclusive because of failure to take into consideration 
such potentially emotional stimuli as the influence of 
novelty, It may be that this is not a highly serious mat- 
ter, however, for Hendry, Carpenter, and Emmes 
(268) have shown that unpracticed subjects differ but 
slightly from practiced, although one practice period 
is recommended. 

McDowall and Wells (420) hold that monotony in 
the physiological sense is the exact opposite of emo- 
tion, They have established that vascular reactions 
cease as soon as the stimulations or effort become mo- 
notonous. 

Grafe and Mayer (227) and Grafe and Traumann 
(229) have conducted experiments on the metabolic 
effect of emotions in the hypnotic state. 

DuBois (172) has a good chapter on the effect of 
emotional states, and. Grafe (225) has an extensive 
bibliography on emotional states and rtiental disorders. 

Poffenberger (467) points out that metabolic tech- 
niques open a promising means of investigating the ac- 
tual effect of drugs upon human efficiency, Without 
the use of such measures we can only determine that a 
certain drug has a certain effect upon an individual’s 
performance in mental tests or other objective tasks. 
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Very often this effect is found to be either insignificant 
or else highly variable. If the energy cost of the task, 
could also be measured before and after administration 
of the drug we would have a means of determining 
more accurately what actual influence the drug may 
have had upon efficiency. Not much work has been 
done along the line of this suggestion, however, so that 
we shall need to be content with mentioning a few of 
the more general reviews of the effect of drugs on basal 
metabolic rate. 

Boothby and Rowntree (101) conclude that the com- 
mon drugs (not including preparations representing in- 
ternal secretion) do not demonstrably influence the ba- 
sal metabolic rate, but that iodine, adrenalin, etc., do 
have a calorigenic action. Hardikar (250), however, 
found a tendency toward increased ventilation, respira- 
tory exchange, and heat production with doses of quin- 
ine up to two grams, and Higgins (277) found a fall 
in alveolar carbon dioxide tension after administering 
coffee without food. A very extensive review of the 
effects of various drugs and poisons upon metabolism 
has been written by Barbour (32), and chapters on the 
effect of drugs will be found in the books by DuBois 
(172), McCann (412), and Amar (10). 

The delicacy of respiratory exchange techniques is 
well illustrated by the fact that they register changes 
resulting from slight differences in body posture and 
various minor muscular movements. Experiments 
fairly consistently show an increased metabolic rate in 
standing or sitting positions as compared with reclin- 
ing or relaxed positions, as well as increases from cer- 
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tain minor body movements. These results, while of 
little direct significance to the worker in muscular 
metabolism, are of importance in that body posture and 
minor movements may cause serious errors in metabolic 
experiments designed to study the effects of emotion, 
mental work, or other factors having an effect of low 
magnitude. 

Benedict and Benedict (64) found that oxygen con- 
sumption increased 0 to II per cent in sitting as com- 
pared with lying, averaging about 3 per cent, and 9 to 
24 per cent in standing as compared with lying, aver- 
aging about 10 per cent. They also found that one 
movement of the hand to the forehead per minute while 
lying was of no significance but that there was an in- 
crease of oxygen consumption of U cc. for each time 
the legs were crossed while lying down. For a more 
complete treatment of this material sec Benedict and 
Bbnedict (67). 

More energy is consumed, apparently, in lying flat 
in bed th^n when in a semi-reclining position, Emmes 
and Riche (185), experimenting with two subjects, 
showed that there Is an increase in metabolism of 8 
per cent in the lying position as compared with sitting 
upright, and Higgins (277) found a higher alveolar 
carbon dioxide tension in subjects In a rela.xed position 
than when, they were in an erect position. Soderstrom, 
Meyer, and DuBois (530) studied four normal men 
and two cardiac patients lying flat in bed and in the 
semi-reclining position propped up with a back rest, or 
else in a steamer chair. Twenty-one experiments 
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showed that metabolism averaged 3 per cent lower in 
semi-reclining posture. 

DuBois-Reymond and Peltret (99a) concluded that 
energy expenditure is affected by the state of tension, 
and Wishart (619), as we have quoted above, even 
found that an hour’s moderately severe work on the pre- 
vious day resulted in a 1 to 2 per cent rise in the basal 
metabolic rate, Other studies have been made by Si- 
monson (520) on the physiology of standing, by Liljes- 
trand and Wollin (379) on body posture, and by 
Turner (564) on reclining, sitting, and standing posi- 
tions. Benedict (79) and Studer (548) have measured 
the effects on metabolism of resting and standing as a 
preliminary part of their studies on muscular work. 
Similar preliminary studies of the resting state have 
been made by a large number of investigators. 

Before concluding our discussion of the controllable 
organic factors we may mention a few miscellaneous 
influences which have not attracted such wide experi- 
mental attention. Glandular activity, for example, 
must take place at a metabolic cost of some sort, al- 
though we do not know in many instances just what 
such costs amount to. Bircher (85) performed an ex- 
periment the results of which implied that sweating 
does not in itself increase oxygen consumption to any 
great extent. The following is quoted from the Jour- 
nal of (he American Medical Association, Volume 80, 
page 729. “Bircher produced sweating in 20 persons 
by heat and light. The oxygen consumption increased 
13 per cent over the basal rate and the temperature of 
the body about one degree centigrade. The increased 
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.metabolism lasted longer than the sweating.” Kos( 
(347) found that oxygen consumption is not influenced 
through massage. Benczur and Berger (44) found a 
slight increase in alveolar carbon dioxide tension with 
the application of heat and a slight lowering with the 
application of cold. This was a clinical study referring 
to local applications of these temperature differences. 
Pemberton (462), in a study of exposure of the body to 
the therapeutic application of external heat, found 
heightened blood flow, increased metabolism, and in- 
creased elimination of acids, chiefly through expiration 
of carbon dioxide. Lusk and DuBois (393) found that 
“cage life” reduces basal metabolism in the dog and 
has a probable counter-part in lack of exercise and 
indoor confinement in man. Ponzo (470,471) studied 
the influence of volitional factors on respiration. He 
discusses psychological influences upon the rate and 
character of breathing, finding momentary suspension 
of breathing during close attention, accelcruition and 
retardation from thoughts with varying affective tones, 
and influences from the slight laryngeal movements in- 
volyed in thinking. These studies, of course, refer to 
the effectupon the respiratory curve and not upon meta- 
bolism:, as measured by respiratory exchange. 

Mepal JVork. The subject of mental work will be 
given donsideration here under its own specific heading 
because 'fit. is one of the most interesting and signifi- 
cant of the organic factors under control of the experi- 
menter. li^is placed last in the list, however, because 
its actual in|uence upon metabolic rate is so slight that 
it can be detected only in the most carefully controlled 
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experiments, utilizing the most sensitive techniques. It 
appears to be well established that, while purely in- 
tellectual work may have a measurable physiological 
cost, such cost is of such low magnitude as to make it 
easily obscured by practically any or all of the physio- 
logical variables listed in the preceding section. This 
makes the problem of providing adequate controls a 
most difficult one and excludes all but the most refined 
techniques of measurement. Routine determinations 
of the energy cost of mental work in business and in- 
dustry will never be possible by mean? of respiratory 
exchange determination methods. 

The classic experiment on the determination of the 
energy cost of mental work is that of Benedict and 
Carpenter (69). These authors reviewed the early 
work on metabolism in mental and muscular work and 
reported their own studies on this subject in detail. 
Although their experiment is usually quoted as demon- 
strating absence of measurable energy cost in mental 
work, their own conclusions are actually not quite so 
negative. "From the results. ... it would appear that 
the pulse rate was slightly increased, the body tempera- 
ture somewhat higher, the water vapor output in- 
creased by about 5 per cent, the carbon dioxide produc- 
tion increased by about 2 per cent, the oxygen con- 
sumption Increased about 6 per cent and the heat pro- 
duction increased by about one-half of one per cent as 
a result of sustained mental eflfort such as obtains dur- 
ing a college examination.” The reason this experi- 
ment is usually quoted as demonstrating no increase in 
metabolism during mental work is found in the fact 
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that these authors qualify each finding by pointing out 
possible inaccuracies of technique and limitations in in- 
terpretation. They disclose, for example, the serious 
oversight that the controls were all run after the men- 
tal work so that the element of novelty in the situation 
was absent in the controls, but present in the experimen- 
tal trials. This is the principal factor tending to in- 
validate the conclusions of their experiment. A popu- 
lar presentation of their results was written a year later 
by Benedict (48) . 

Other metabolic studies, showing the contradictory 
results characteristic of mental work investigations, are 
those of Ilzhofer (300) and Chlopin, jakowenko, and 
Wolschinsky (139) . The former, using the Krogh ap- 
paratus, found that metabolism during mental work is 
not increased materially, while the latter study indi- 
cates an increase in metabolism in general. A review 
of Ilzhofer’s work wiJI be found in the Journal of the 
American Medical Associalion (83), Chlopin, Jako- 
-wenko, and Wolschinsky review and criticize the 
earlier literature on mental work in a way that gives 
support to their own findings. 

Becker and Olsen (39) have described their methods 
and apparatus in great detail in the expectation that 
other psychologists might wish to use them in similar 
experiments. Foyer (473) has written a theoretical 
article with a brief bibliography on mental work. 
Spencer (533^i) contributes summaries and references 
on meiilal work, and Grafe (225) has a bibliography of 
23 titles, on this and related subjects. The DuBols 
manual (172) has a chapter on mental work, and fur- 
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ther reference material will be found in Takahira and 
Ishibashi (554), Mainzer (399), Liebermann (377), 
and Kestner and Knipping (318). This latter study is 
on the relation of protein diet to mental work. 

The influence of mental activity on vasomotor 
processes has been studied by several workers. Gilles- 
pie (215) , in a study of the relative influence of mental 
and muscular work on the pulse rate and blood pres- 
sure, concluded that mental work produces an increase 
in pulse rate and blood pressure, that this increase is 
independent of emotional factors, and that it cannot be 
accounted for by movements of articulatory muscles or 
known muscle tensions. Combined mental and muscu- 
lar work was found to produce a greater effect than 
either singly. In women the pulse rate change was 
found to be proportionately twice as great as the blood 
pressure, but with men it was about the same. Mc- 
Dowall (419) also found that pulse rate and blood 
pressure arc both raised by either mental or muscular 
work. This study carries a good non-technical account 
of the physiology toncertied. Day (157) gives an ex- 
cellent summary of experiments on effect of mental 
work on pulse rate and blood pressure, as well as other 
factors influencing vaso-motor processes. 

Dodge (165) gives a thoughtful account of the psy- 
chological justification for including a study of men- 
tal work in psychology. In this article will be found 
a critical discussion of pulse rate determination as a 
simple metabolic rate determining technique, and a 
statement of the desirability of finding and developing 
a more reliable index which will retain the advantage 
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of simplicity. Dodge describes elaborate electrocardi- 
ograph and string galvanometer experiments in con- 
nection with the study of the metabolism of mental 
work. 

Several other techniques have been employed to de- 
termine the energy cost of mental work, Goldberg and 
Lepskaia (219) have studied the alterations of the 
white corpuscles in the blood during mental and physi- 
cal work. They found more pronounced changes dur- 
ing mental than physical work, with a return to normal 
in about two hours. Knipping (335) has found the 
measurement of phosphoric acid in the blood a more 
satisfactory measure of mental work than respiratory 
exchange determination, 

Environmental Factors Affecting Metabolism. On 
casual thought it would appear to be a simple matter 
to classify the factors influencing metabolism as being 
either environmental or organic. The effect of food 
;and drugs is obviously physiological; the influence of 
temperature' extremes in the atmosphere is clearly en- 
vironmental, But when wc attempt to classify such 
factors as oxygen lack in the atmosphere we find our- 
selves in some degree of doubt, and when we go to such 
factors as the effect of noxious gases, as studied by 
Henderson and Haggard (266), we realize the rela- 
tivity of our classifications. We classify various fac- 
tors k® au aid in discussing them, but these classifica- 
tions do not carry the implication that there is a funda- 
mental distinction between the groups comprising our 
artificial schema. 

The effect of high and low atmospheric temperatures 
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Is probably the most important of the environmental 
variables affecting metabolism. This is treated^ of 
course, in the general manuals of which Luslc (387) 
may be taken as an example. Lusk publishes on page 
149 an interesting table showing the influence of cloth- 
ing on metabolism. Carbon dioxide production is in- 
creased when scant clothing is worn in a cold room, 
McConnell and Yagloglou (416) and McConnell, 
Yagloglou, and Fulton (417) demonstrated that the 
rates at which oxygen is consumed and carbon dioxide 
produced increases with exposure to either high or low 
temperatures, and that the metabolic rate increases 
rapidly when the environmental temperature is higher 
than that of the body. Other studies on the effect of 
high air temperatures on basal metabolism were made 
by Benedict, Benedict, and DuBois (46), and Bircher 
(85). Moss (436), Volshinski, and Yakovenko (579) 
and Viale (572) have studied the effect of high tem- 
peratures on working metabolism. Hill (288) and 
Hill and Campbell (289) studied the effect of cool air 
currents on working metabolism, the latter finding that 
“during 15 minutes work on a bicycle ergometer the 
cooler conditions greatly relieved the heart — reduced 
the pulse rate — although the gross and net efficLencies 
of muscular contraction were not affected.” This is 
quoted from the Abstract of Literature of Industrial 
Hygiene, taken in turn from Physiological Abstracts, 
Volume 7, pages 472 ff. 

The effect of temperature differences is certainly 
great enough to warrant considerable care on the part 
of the investigator in controlling this factor in experi- 
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ments upon muscular work. There is still some ques- 
tion as to just how nearly constant the temperatu rc must 
be maintained in determining basal metabolism. There 
is some evidence that basal metabolism determinations 
should be made with the subject in a warm bath of con- 
stant temperature. This is discussed by Benedict and 
Benedict (66) and Mayer and Wurmser (410). 

The effect of tropical and Alpine climates on basal 
metabolism and working metabolism is the long-time 
aspect of the temperature differences discussed above. 
The physiological effects of tropical climate have been 
surveyed in a comprehensive way by Sunclstroem 
(551). Ozofio de Almeida (455) and Hindmarsh 
(294) found a lowered basal metabolism in tropical 
climates. Hindmarsh attributed this to the ready re- 
laxation favored by the warm environment. Others 
who have studied the effect of tropical climate on basal 
metabolism are Coro (146) and Montoro (429). 
Cohen (140) , in a letter to the Jotimin/ of the American 
Medical Associaiiotij pointed out what he considered 
an old and important observation made by Mayer in 
Java indicative of Lessened oxidation of tissvics in tropi- 
cal climates. 

It is often difficult to separate the effects of race and 
the effects of climate in studying metabolic differences 
of peoples living either in tropic or frigid regions. 
Heinbecker (259) » for example, finds that ‘hhc basal 
metabolism of Eskimos is considerably higher than that 
of persons living in temperate zones,*’ But we cannot 
be certain about the factors to which this should be at- 
tributed. Hill and Campbell (290) found that the 
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heat production of resting subjects exposed to Alpine 
air increased between 40 and SO per cent in the case of 
clothed adults and 60 to 90 per cent in the case of nude 
children. Vi ale (571) has studied working metabol- 
ism ill an Alpine climate. Corlctte (145) found a cold, 
moist atmosphere to be a milder environment than cold, 
dry air. Dhar (1S9), in a theoretical article concern- 
ing man only indirectly, notes that it is more advantage- 
ous for a man living in a colder climate to change to a 
warmer climate than to change in the contrary direc- 
tion. 

There are various types of radiation, such as sun- 
light, Roentgen rays, ultra-violet light, etc., which have 
been thought to play some part in determining meta- 
bolic rates. Their influence, however, is not great and 
this factor will probably not need careful control in in- 
dustrial experiments. A review of the literature on 
sunlight and other kinds of radiation has been written 
by Laurens (368) , There is a section on the effects 
upon metabolism, pages 40-50, and a bibliography on 
the effects upon metabolism, page 87. Ultra-violet ra- 
diation has been studied by Crofts (149) and by Mason 
and Mason (408). The latter found that ultra-violet 
light from a quartz mercury vapor lamp is capable 
of lowering the total metabolism of some persons. 
Flicicingcr (193) and Campbell (120), however, are 
unable to find this effect. Lindhard (38 1J believed that 
seasonal periodicity in respiratory functions is due to 
the varying intensity of the sunlight. 

Lyon and Greisheimer (394) describe an experiment 
in which the air surrounding the body of human sub- 
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jects was kept constant in temperature and humidity 
while air of constant temperature and varying humid- 
ity was inspired for lO- to iS-miniitc periods. Their 
tentative results indicate higher pulse rate, lower res- 
piration rate, higher arterial pressure, and greater peri- 
pheral vascularity while breathing moist air than while 
breathing dry air. 

A good review of the work on high and low atmo- 
spheric pressures, as well as with air of abnormal com- 
position, will be found in Haldane (243). The effect 
of low pressures has been studied both in specially con- 
structed chambers and in experiments performed at 
high altitudes. A celebrated experiment of this type is 
that of Douglas, Haldane, Henderson, and Schneider 
(169). This experinnent, known as the Anglo- Anneri- 
can Pike's Peak Expedition, was productive of many 
useful conclusions. The respiratory exchange was 
found to be unaltered whether at rest or at work, and 
acclimatization was distinctly evident after two or three 
days’ residence on the summit. Schneider (SOI) re- 
ported changes in the blood circulation and respiration 
of a man who lived for a long time on the summit of 
Pike's Peak. The influence of rapid change of alti- 
tude on circulation and respiration has been studied 
by, Hecht (2S7}. Other altitude studies are those of 
Kestner and Schadow (319) and Herxheimer, Wissing, 
and Wolff (273). 

Schneider and Clark (502-504) have conducted a 
series, of laboratory experiments on the effect of low 
barometric pressure on working metabolism, These are 
careful experiments in which several physiological f ac- 
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tors are studied, although work costs are estimated only 
on the basis of oxygen consumption. They point out 
(504) that when one is first exposed to low pressure the 
oxygen want stimulates the respiratory center, causing 
a blowing off of preformed carbon dioxide, This cannot 
therefore be used for measuring metabolism in experi- 
ments under low atmospheric pressure. Schneider, 
Truesdell, and Clark (507) studied oxygen consump- 
tion at rest during short exposures to low pressures in 
an experiment preliminary to those just described. 

There still remains a wide variety of environmental 
conditions which have been thought to play a part in 
determining metabolic rates, but most of these are of 
small significance to the industrial investigator. Some 
of them are principally of therapeutic interest, such as 
the study of massage by Kost (347) and studies of 
therapeutic baths, such as the carbonic acid bath 
studies by Laquer and Gottheil (367) and Wassermann 
(608) . The carbonic acid bath increases the rate of 
carbon dioxide exhalation. The effect of sea baths has 
been studied by Margaria (401), who found an influ- 
ence on the temperature of expired air and the ventila- 
tion of the lungs. 

Benedict and Finn (75) publish results of experi- 
ments showing that the basal metabolic rate of an indi- 
vidual is in general sufficiently fixed as to be unaltered 
by a summer’s vacation, even when pronounced sub- 
jective impressions of regeneration are experienced. 

McDowall and Wells (420) have written a theoreti- 
cal discussion of the physiology of monotony. This has 
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already been treated in our consideration of the effect 
of emotions. 

Diserens (163) has made a comprehensive study of 
the influence of music on behavior, but, although the 
volume carries an exhaustive bibliography, tliere are 
only two references to the effect of music on gaseous 
metabolism and both of these are very early animal 
experiments. The reviewer has encountered no experi- 
ments up to 1928 on the effect of music upon human 
gaseous metabolism. This would appear to be a fruit- 
ful field for investigation both from the theoretical 
side and from the standpoint of a possible reduction in 
energy expenditure during work. 

General References on Factors A^ecling Melabal- 
ism. In concluding, we may mention several references 
in which the factors influencing metabolic rate are re- 
viewed in a general way, including a consideration of 
both Organic and environmental factors. 

A general survey of factors affecting metabolism was 
/written by Benedict (SO) in 1915 and a very readable 
popular account in 1928 (63), in which the history of 
metabolism and its measurement were also included. 
He wrote a more detailed treatment of the factors af- 
fecting metabolism as well as a description of types of 
apparatus, etc., in 1924 (54), Benedict and Carpenter 
(70) .discuss a variety of factors in a scries of experi- 
ments on. metabolism of healthy men in the resting 
state, /DuBois (172) lists six variable factors which 
must be potitrolled in metabolism experiments and 
twentyifattprs to be taken into account in interpreting 
results, 'Knipping (334) has also discussed the many 
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ktots wbich iniist k considered in basal nietabolism 
invesligations, Amar (Id) treals of these factors mote 
particularly as they deal with worltiag metaholisra, 
Other studies of a more special sort are those of Sainton 
and Peron |I9S) and Harris and Benedict |2S1). 
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APPARATUS AND METHODS 

General Manuals on Apparatus and Methods, Aside 
from a few very early contributions such as those of 
Jaquet (306), Loewy (383), and Tigerstedt (561), all 
of the general reference manuals, monographs, and 
textbooks that treat of methods and apparatus used in 
determining human energy consumption have already 
been mentioned in preceding sections of the present re- 
view. It should be useful, however, to assemble them 
in a chronological order here as a matter of conveni- 
ence. 

Those interested in early types of apparatus will do 
well to investigate Johansson (309), Lefevre (373), 
and Amar (10) in addition to the three studies men- 
tioned in the preceding paragraph. 

A more modern group of references comprises those 
of Carpenter (123), published in 1915, Krogh (354), 
published in 1916, Boothby and Sandiford (104), pub- 
lished in 1920, and Sanborn (496), published in 1922. 
Some of these books will be referred to later in discus- 
sing specific types of apparatus. 

Aside from the textbook by DuBois (172), already 
quoted Several times, all of the modern reference 
manuals in which there is adequate treatment of 
methods and apparatus have been published in German. 
Benedict (54) has written an article in which all the 
forms of apparatus used by the Nutrition Laboratory 
are described. These include chamber types, portable 

[400] 
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types, student apparatus, micro-respiration apparatus 
and devices such as ergometers and tread-mills. His 
primary concern, however, is with the apparatus used 
in determining basal metabolism. Klein and Steuber 
(325) treat fully of the various types of apparatus used 
in determining oxygen and carbon dioxide content of 
gases, Abderhalden (2) edits an exhaustive and com- 
prehensive symposium on calorimetry and gaseous 
metabolism providing a complete working manual on 
all types of apparatus and methods. Krauss (3+8) de- 
scribes and illustrates a wide variety of experimental 
apparatus and methods, although he does not include 
much that is useful in work experiments, Knipping 
and Rona (343) describe the best modern methods of 
both direct and indirect calorimetry, tread-mills, ergo- 
meters, masks, electrical devices and many modern im- 
provements and useful points of technique. The sec- 
tions on energy metabolism, pages 98-125, and work 
metabolism, pages 223-240, constitute a clear and 
comprehensive account of modern techniques, well 
organized, and supplied with splendid cuts and dia- 
grams. Knipping and Kowitz (342) have written a 
splendid little working manual with illustrations of 
apparatus and descriptions of apparatus, both early and 
modern. Their interest is not primarily in working 
metabolism, however. 

In addition to the various books listed above there are 
several articles in the periodical literature of consider- 
able interest here. Benedict (58) has written an 
extended and useful account of modern techniques in 
the measurement of the gaseous metabolism of man, 
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and will be referred to later in discussing certain of the 
methods treated. Hendry, Carpenter, and Emmes 
(268) describe an important experiment in determining 
the relative merits of various combinations of respira- 
tion apparatus and breathing appliances. Pickworth 
(465) has also compared different techniques, finding 
that basal metabolic rates determined in a chamber 
were more than 20 per cent lower than those deter- 
mined by the Douglas Bag method. 

Other journal articles of less significance to the in- 
dustrial worker are those of Meserve (425) , describing 
certain types of metabolic apparatus, and Stoner (541) , 
describing the organization of a metabolism laboratory 
suitable for a large hospital. The reference to Jones 
(3)2) is only of minor significance and the reference to 
Cowgirl (147) was retained in this bibliography 
through oversight, as an animal experiment is described 
which has no interest for us. 

Incidental and Non-Quantitaflve Indexes of Meta- 
hoHc Rate. Before discussing apparatus and methods 
used in respiratory exchange determination as a means 
of discovering the energy cost of work, we shall do well 
to examine briefly a few of the more important physi- 
ological indicators of metabolic rate which may be 
employed in experiments that do not demand quantita- 
tive results, expressible in terms of calories per unit of 
time, I ' ' 

Tile most important and the most widely used of such 
mea^Urbs are vaso-motor records such as pulse rate, 
pulse Ipressu re, blood pressure, etc. Gillespie (215) 
experimented upon the relative influence of mental and 
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muscular work on the pulse rate and blood pressure, 
having been led into this field through Benedict’s as- 
sertion that “pulse rate indicates, in a general way, 
internal muscular work and muscular tonus.” He 
found pulse rate and blood pressure to be affected by 
mental work or muscular work or both. Cathcart, 
Bedale, Macleod, Wetherhead, and Overton (132) 
found a general, but not strictly proportionate, corres- 
pondence between pulse rate and gaseous exchange and 
blood pressure and gaseous exchange. McConnell and 
Yagloglou (416, 417) also found that both pulse rate 
and body temperature correlate fairly well with the 
basal metabolic rate, and Read (479) , as a result of 300 
determinations, concluded that pulse pressure and rate 
vary in the same direction as the basal metabolic rate. 
Jackson (303) even contended that the two factors, 
pulse rate and pulse pressure, taken under basal con- 
ditions enable one to estimate the basal metabolic rate 
with considerable accuracy. Smith (525) found pulse 
rate a fair metabolic index in grade walking at various 
rates and upon different grades, although it was not 
reliable in the case of horizontal walking and even in 
grade walking showed wide individual fluctuations. 

Hafkesbring and Collett (239), on the other hand, 
found no correlation between temperature and pulse 
nor between pulse rate and basal metabolic rate. 
Frumerie (199), furthermore, found pulse rate unsatis- 
factory as an index of muscular activity, and Dodge 
(165) considers pulse rate determination only a make- 
shift in mental work experiments. Pulse rate deter- 
minations are rather easily obtained in most experi- 
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ments, however, and record of thenl will be found in 
most experiments on working metabolism. A good 
summary of the experimental work which has been 
done on pulse rate and blood pressure under varying 
conditions has been written by Day (157). 

A closely related physiological measure is that of 
deep body temperature. This was recorded in many of 
the experiments performed by Benedict and his col- 
leagues, for example, the study by Benedict and Cath- 
cart (72) . There is usually a fair correspondence 
between changes in deep body temperature and the 
amount of external work being performed. 

Knoll (344-) has considered the relationship between 
respiration rate and the volume of air inspired, but 
there does not seem to have been much work done on 
correlating respiratory rate with external work. Schnei- 
der (499) finds that breath-holding power with the 
lungs deflated, when the subject is under basal rate con- 
ditions, yields results which harmonize closely with 
those obtained by means of the usual methods of de- 
termining basal metabolic rate. This is, of course, a 
wholly different matter from that of determining 
energy expenditure from respiratory rate, but the con- 
tribution may prove significant in helping to establish 
respiratory rate as an approximate index of energy 
metabolism. 

The electrocardiogram has been used by Messerle 
(426) in experiments upon work and the effect of rest 
pauses. He reviews the literature in this field and in- 
dicates that such measures may provide a useful method 
of studying muscular work. Dodge (165) and others 
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much more recently have used the electrocardiogram 
for studies of mental -work and emotions. 

An approximate index which would appear to be 
very useful, but which has not been widely employed, is 
that of determining loss of weight during work. 
Shepard (515) and Moog and Schwieder (430) 
utilized this measure, Shepard using it in an industrial 
study and Moog and Schwieder in studying swimming 
and running. The latter investigators found that 
swimming 1000 meters caused as great a loss of water 
from the lungs and skin as was produced by running 
10,000 meters. 

A determination which is fairly simple to make but 
which is not highly significant is that of the acidity of 
the urine before and after work, Hastings (254) found 
that the urine of men engaged in manual labor tended 
to be of a slightly higher degree of acidity than that 
of men at rest, and that, while the urine of muscular 
individuals was more acid after work than before, there 
were variations in the case of weaker persons. 

Many of the most complete and careful studies of 
muscular work have included records of most of the 
measures we have mentioned as well as determining 
respiratory exchange or its equivalent. Typical 
thorough studies of this type are those of Cook and 
Pembrey (144) , McKciih, Pembrey, el nl (395), Bene- 
dict and Carpenter (70), and BcdaJe (41), Although 
Bedalc determined ventilation rate, respiration rate, 
pulse rate, and blood pressure, in addition to oxygen 
consumption, the pulse rate and ventilation rate only 
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were found to bear a significant relationship to oxygen 
consumption. 

Direct Calorimetry. The interest of the industrial 
engineer or the industrial psychologist in the methods 
of direct calorimetry is principally an historical one. 
The many careful experiments of Benedict and Car- 
penter, for example Numbers 69, 70, 71, and 124 of 
our bibliography, as well as numerous experiments by 
other early workers have demonstrated beyond all 
doubt that energy expenditure may be as accurately 
determined indirectly through the respiratory exchange 
as by the most sensitive direct measurements in a res- 
piration calorimeter. Those who may have an interest 
in this type of apparatus are referred to the studies just 
enumerated and to the somewhat more recent studies of 
Langworthy and Milner (366) and Langworthy and 
Barott (365). The differential calorimeter described 
by Noyons (4S0) is one of the few calorimeters in use 
at the present time, Noyons’ complex and expensive 
direct calorimeter is described as being responsive to a 
tiiuscular act on the part of a subject placed within it in 
about 10 seconds and sufficiently sensitive to measure 
the heat resulting from raising the forearm only 15 
centimeters, The book describing this apparatus was 
unavailable to the reviewer; it is abstracted m Psycho- 
logical AbstractSjYolvktnt 1, No. 1497. 

, Somciof the older types of calorimeters provided for 
determination of energy expenditure, not only by means 
of direct calorimetry, but also through some form of 
respiratory exchange determination, usually by means 
of closed circuit methods. Another characteristic of 
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the older types of apparatus was that many of them 
were constructed to accommodate studies of several 
persons participating in an experiment within the same 
chamber. The use of the closed circuit methods in 
indirect calorimetry will be treated in detail in a later 
section; we shall pause here for a few references con- 
cerning group calorimetry. Benedict, Miles, Roth, 
and Smith (78) describe a well-constructed group res- 
piration chamber with tread-mill and facilities for 
work experiments, and Benedict and Johnson (77) de- 
scribe a large chamber for studying groups of thirty 
to forty persons by carbon dioxide elimination only. 
Gullichsen and Soisalon-Soininen (236) and other 
Swedish experimenters have also employed respiration 
channbers with provisions for determining carbon 
dioxide output only. Benedict (58) describes a new 
respiration chamber of the open circuit type which 
should prove useful in careful laboratory studies of 
working metabolism. The sensitive gas analysts tech- 
nique developed by Carpenter (122) is employed in 
conjunction with the chamber described by Benedict. 

Indirect Calorimetry: Open Circuit Methods^ 'Gaso- 
meter Type, The remaining sections of this part of 
our paper will be concerned with apparatus and 
methods used in indirect calorimetry; these will be 
considered under the two broad categories of open 
circuit methods and closed circuit methods, which will 
in turn be subdivided into various types. 

Open circuit methods in their primary form may be 
thought of as consisting of a calibrated gasometer for 
the collection of expired air and suitable arrangements 
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for sampling this air for later chemical analysis. Some 
variant of this standard form is typically regarded as 
essential in scientific work in which no sacrifices are to 
be made in completeness or accuracy, 

A good description, with student laboratory instruc- 
tions, of the apparatus used in the gasometer or spiro- 
meter method has been written by Henderson (264). 
Haldane (242) describes the original Haldane gas 
analysis apparatus, now often modified after Henderson 
(263). DuBois (172) says of this method, page 93, 
“The technic of air analysis is rather difficult, requir- 
ing at least a solid month of practice.” The present 
writer, however, following the instructions given by 
Bailey (28) and Boothby and Sandiford (104), was 
able to run analyses which checked satisfactorily with- 
in one week. It would seem that a person trained in 
laboratory methods should not require the amount of 
time mentioned by DuBols, although it is true that even 
a mechanically inclined experimenter is likely to have 
difficulty with the apparatus occasionally for a rather 
long period of time, The manual by Boothby and 
Sandiford carries full instructions covering all parts 
of the -method necessary for ordinary experimental 
work. They describe the collection of gas samples, the 
analysis of gas, methods of computation, and use of 
accessory apparatus such as the gas mask. Bailey (28) 
has also written a very careful and detailed description 
of basal metabolic rate determination by the gasometer 
method. He describes the use of sampling bottles and 
illustrates each step in gas analysis. 

A useful note with respect to the sampling bottles 
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this method involves is that by Carpenter and Fox 
( 127 ). “The results show no evidence of stratification 
, . . . and the diffusion was found to be rapid and 
complete even under extreme conditions. . . . Once a 
homogeneous sample of air is drawn into the container, 
it may be analyzed subsequently at any time without 
mechanical shaking and without fear of inadequate 
mixing.” 

A description of these methods of analysis will also 
be found in Hawk and Bergheim (256). 

In addition to the Henderson modification mentioned 
above, improvements of the Haldane apparatus have 
been made by Newcomer (448), Gmeiner (217), and 
Martini and Picrach (407), The two latter studies 
describe a mechanical mixing device to be used in con- 
nection with the Haldane apparatus. Stoner (544) 
describes and illustrates a blank for recording the data 
and calculations, without logarithms, of the determina- 
tions involved in the gasometer method. He also 
supplies formulae (543) simplifying the calculations. 

The same open circuit principle with subsequent gas 
analysis is used in the recent respiration chamber de- 
scribed by Benedict (quoted in preceding section), and 
is also used in the cot calorimeter described by Grafe, 
Strieck, and Otto-Martiensen (228). The refined and 
delicate gas analysis technique employed was developed 
by Carpenter and described by him in 1924 (122). 
Grafe, Strieck, and Otto-Martiensen also describe and 
illustrate the Carpenter apparatus. According to 
Benedict, the Nutrition r.aboratory has now definitely 
abandoned dosed circuit chamber mcihoils in its favor. 



410 


OBNETIC psychology MONOGRAPHS 


Krogh and Lindhard (3S7) also sponsor an open circuit 
method in conjunction with a closed room of known 
volume with accurate subsequent determinations of the 
small changes in the gas concentrations by means of a 
highly sensitive gas analysis technique. 

Other developments from the Haldane apparatus, 
or, more properly, back toward the original Orsat 
device, must also be mentioned, Boigey (93) objects 
to Laulanie’s apparatus on the grounds of nonporta- 
bility and high price, to Haldane’s on account of 
complexity, and to Waller’s on account of limitation to 
carbon dioxide and the absence of correction or com- 
pensation for the effect of temperature. He describes 
the construction and use of his own highly simplified 
device which is an inexpensive apparatus mounted in a 
small carrying case, using potassium hydroxide and 
pyrogall. The device is manufactured and may be 
obtained from Pirard. McKinlay (421) simplifies 
gas analysis by the easy expedient of using only the 
sodium , hydroxide side of the Haldane apparatus, 
thereby determining only the carbon dioxide percent- 
age. Hindmarsh (294) compared basal metabolic 
rates determined on the basis of oxygen alone, carbon 
dioxide alone, and the two gases taken together. He 
found that using carbon dioxide alone tended to obscure 
certain irhportant differences revealed in a more com- 
plete experiment. 

A careful study and comparison of open and closed 
circuit methods was made some years ago by Hendry, 
Carpenter, and Emmes (268), They experimented 
with all possible combinations of the Benedict Portable 
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(closed circuit) and the so-called Tissot, or respiratory 
valve, open circuit apparatus, with mouth pieces, nose 
pieces, and half-face masks. Both types of apparatus 
and all three breathing appliances yielded a fair degree 
of accuracy for oxygen consumption but the mask gave 
the lowest carbon dioxide values and respiratory quo- 
tients of the three breathing appliances. Carbon di- 
oxide output was found to fluctuate widely in the case 
of certain subjects when wearing any type of breathing 
appliance and was usually higher with the portable 
apparatus than with the respiratory valve type. A 
series of experiments on the reliability of duplicate 
trials indicated general physiological superiority of 
nose pieces and respiratory valve apparatus, but these 
were not recommended because of technical difficulties. 
The mask was favored for use with the respiratory valve 
apparatus and the mouth piece with the Benedict 
Portable apparatus. The respiratory valve apparatus 
with mask was found best when reliance was placed on 
carbon dioxide elimination alone. 

Open Circuit Methods^ Douglas Bag Type, Me- 
chanically it is a simple matter to substitute a rubber 
bag for the gasometer or spirometer in the set-ups 
described in the preceding section. The suggestion 
that this be done was originally made by Douglas (167) 
in 1911 and the method has since grown to wide popu- 
larity under the name of the Douglas Bag Method. It 
has been widely used iu industrial experiments such as 
that of Rosenheim (484) and in laboratory experi- 
ments of a more technical nature such as that of Cath- 
cart and Stevenson (137). The original note written 
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by Douglas was sketchy and inadequate, but the method 
has been described in considerable detail by Cathcart 
(131). Cathcart supplies condensed instn.'ctions for 
the use of the Haldane analyzer and desc, ibes methods 
of computation, as well as details in the technique of 
using the Douglas bag. 

One of the serious limitations of the Douglas bag 
technique as compared with the use of a gasometer is 
that the bag cannot be made of as large capacity as is 
possible with the metal bells. Krogh (354) reports 
that “with violent exercise a bag taking 60 liters will 
not hold the air expired during one minute." This 
limitation has been overcome by Campbell, Douglas, 
and Hobson (121), using a battery of Douglas bags, 
and Hill, Long, and Lupton (285), who describe a 
similar battery of bags for the study of rapidly chang- 
ing metabolism. Furusawa (203) describes a spiro- 
meter modification making it possible to study rapidly 
changing gaseous metabolism over a long period of 
time as in the bag-battery method just mentiond. A fan 
is mounted to operate within the bell of a large gal- 
vanized iron spirometer and the volumes are read from 
a scale beside the falling counter-weight. This has the 
advantage of simplicity and the possiblity of making 
direct volume readings. 

Raper (476) enumerates several disadvantages of the 
original valve supplied with the Douglas respiration 
apparatus and publishes a diagram of a new valve by 
Siebe, Gorman and Company in which these disadvan- 
tages arc overcome. Kommercll (345) simplifies the 
calculations of the Douglas Bag Method through the 
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use of noTUpgraphic tables. Moss (436) used the 
Douglas Bag Method in a study of energy costs of 
work at high) air temperatures but computed the energy 
expenditure, not on a theoretical basis as basal meta- 
bolic rates are determined, but by comparing results 
from the unknown task with results obtained on an 
ergometer study of the same individual. A- physiologi- 
cal deficiency in his methods is seen in the fact that his 
subjects were required to inspire through a mouth-piece 
connected by a three-foot tube to a gas meter, and to 
expire through a three-foot tube connected with the 
Douglas bag. 

Bedale (41) apparently used the Douglas bag in an 
experiment in which samples were collected for less 
than one minute and based her conclusions on deter- 
minations of the oxygen consumption alone. Her 
results appear to be highly significant, however, in 
spite of the fact that carbon dioxide production and 
respiratory quotients were not determined. A more 
serious curtailment in technique would have been the 
ignoring of oxygen consumption data and reliance 
upon carbon dioxide elimination alone. As will be 
seen from the section on the Waller Method there is 
use for such a simplification in certain industrial prob- 
lems, but it should not be resorted to in investigations 
of a more refined sort. 

There is still some question as to the reliability of 
the Douglas bag technique even when used in its com- 
plete form, so the experimenter should be very wary 
of simplified procedures in experiments in which the 
fluctuations of the variable being studied are of low 
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magnitude. We shall close our discussion of the Doug- 
las bag method with quotations favorable to the method 
and others opposed. 

Carpenter (123) finds the Douglas bag reasonably 
satisfactory for short periods of time provided the bag 
is of 100-liter capacity. He advises pneumatic nose- 
pieces when the bag is used with untrained, disin- 
terested subjects. Greenwood, Hodson, and Tcbb 
(231) used the Douglas Bag— Haldane Method in 
studying the working metabolism of a fairly large 
number of women. They were able to verify the 
Douglas-Haldane technique in the sense of demonstrat- 
ing uniformity of results among moderately trained 
analysts. Dautreband and Davies (1.55) are critical 
of closed circuit methods for clinical use and describe 
in detail the use of the Douglas-Haldane Method, 
which they prefer for use in the clinic. Hill and 
Lupton (287) employed the Douglas bag method in 
an experiment on muscular work in which they re- 
ported that a steady state was reached within two 
minutes. 

'On the other hand, Cathcart, Lothian, and Green- 
wood (134) discuss statistically the validity of Doug- 
las-Haldane observations and publish a series of 96 
observations of the same subject under uniform condi- 
tions. Twelve per cent of the cases were found to lie 
beyond three times the probable error. The “error” 
is compound, that is, it is dependent both upon ex- 
perimental technique and physiological fluctuations. 
Pickworth (465) even found that basal metabolic rates 
determined in a chamber were more than 20 per cent 
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lower than those determined by the Bag method. He 
believed that this might be due to the lesser emotional 
disturbance and the increased possibility of muscular 
relaxation. Wishart (621) found that the maximum 
and minimum of a series of observations of a single 
individual by the Douglas Bag method may differ by 
as much as 30 per cent. It is also significant to note 
that Sargent (497) found that the total oxygen con- 
sumption during recovery cannot be estimated by 
observing partial recovery and applying a constant 
correction. 

The Waller Simplification of the Douglas Bag 
Method. Having noted the limitations of the Douglas 
Bag method as well as its advantages, we are in a posi- 
tion to consider a simplification proposed by A. D. 
Waller several years ago and used by him in a number 
of studies of an industrial nature. The method was 
proposed only as an approximation method but was 
developed on the grounds that it provided the only 
possible way of studying respiratory exchange of in- 
dustrial workers actually engaged in the performance 
of their duties. The method has been in disuse the 
past few years, and the present writer would hesitate to 
revive it were it not for the fact that the method, if 
its validity can be established even within low limits of 
accuracy, will provide the industrial psychologist and 
the industrial engineer with the most usable of all tools 
available for the study of working conditions, rest 
pauses, and related industrial factors such as were 
described in the introductory section of the present 
work. We shall enumerate the actual studies made by 
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Waller and then proceed to a discussion of the evidence 
for and against the validity of the Waller Method and 
the material available for judging the limitations to be 
intposed upon the interpretation of results secured from 
the method. 

Because of the rather wide disregard Waller’s 
methods have encountered there are many people in 
this country who ate unacquainted with his work and 
do not associate the Waller Method with the name of 
the well-known physiologist who achieved greater dis- 
tinction for some of his other contributions to science. 
In, his obituary (604) we learn that he was the son of 
A, V, Waller, a celebrated English physiologist, and 
that in 1892 he was made a Fellow of the Royal So- 
ciety, being at that time 36 years of age. He was the 
first to obtain a human electrocardiogram, accomplish- 
ing this in the eighties, long before the introduction of 
the string galvanometci. He was also well known for 
his later work with the psychogalvanic reaction, as well 
as for his sponsoring the method of energy consumption 
beating his name. The method met criticism during 
his own later years, but death cut off his project to 
establish the validity of the method in a carefully con- 
trolled experiment of some magnitude. 

Most: of Waller’s own reports on his experiments are 
brief in nature and there is more or less repetition in 
the description of method. In an early study (.')84) he 
states' that he uses a 20-liter rubber pillow fitted with a 
valve and two-way tap, opened and closed at a signal 
from an assistant with stop-watch. A mouth-piece and 
nose-clip are used, the bag being held by the subject. 
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Expired air is collected for a period of only 30 to 60 
seconds, the workmen’s activities being interrupted for 
that period, In this paper Waller supplies statistics 
and a graph showing carbon dioxide ordinates at one- 
minute intervals before, during, and after stair climb- 
ing in justifying his procedure of collecting the expired 
air immediately upon interrupting the men’s work. The 
energy consumption calculations are based upon carbon 
dioxide determinations, with an assumed respiratory 
quotient of about 84. Details of the computation are 
given in another article in the same volume of the same 
journal (582), and he also explains his method of 
assigning mechanical work equivalents to volumes of 
carbon dioxide. He contends, for example, that, at 
an R.Q. of 84, one cubic centimeter of carbon dioxide 
is equivalent to 2.5 kilogram-meters, This description 
of his method seems to have been favorably reviewed 
by Stiles in the Abstract of the Literature of Industrial 
Hygiene, Volume 1, page 41. 

Data useful in preparing graphs of carbon dioxide 
output over short periods are supplied in a later con- 
tribution (587). In another paper (596) he proposes 
a chart defining various types of work in terms of car- 
bon dioxide discharge. Sedentary work, for example, 
is described as work in which less than five cubic centi- 
meters are expired per second, this signifying an energy 
consumption of less than 100 calories per hour. At the 
other end of the scale is placed the heaviest of manual 
labor, with a carbon dioxide exhalation of more than 20 
cubic centimeters per second, or an energy consumption 
of over 400 calories per hour. 
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Articles 585 and 595 in our bibliography are replies 
to objections to his method and will be referred to again 
in the next section. In another study (598) lie reports 
an experiment performed upon one subject in which 
energy consumption was estimated by the usual method 
and by the Waller Method. A reasonably accurate 
correlation between the two was found. The method is 
defended further (602) in his contention that the error 
of the Douglas-Haldane method of measuring physio- 
logic cost is plus or minus one per cent and that under 
the method of measuring the carbon dioxide alone for 
brief periods, as he proposes, it is increased to only plus 
or minus 5 per cent, provided care is taken to make all 
measurements after a steady regime is established, By 
doing this there is practically no interruption of the 
industrial processes on which the worker is engaged 
and the sampling proceeds practically under actual con- 
ditions of work. Many samples per day may be taken 
by one operator and if the results are treated statis- 
tically it is possible to derive conclusions of significance 
in industry, since the effect of at least some of the vari- 
ables of industrial interest is likely to be of a magnitude 
at least three times this probable error. 

Readers of this and other articles will find that 
Waller's curves all show hourly increase in carbon 
dioxide during each period of work, a restorative effect 
in the dinner hour, a rapid fall in carbon dioxide dis- 
charge during a rest pause of three or four minutes, 
and a higher maximum in the afternoon than in the 
morning. In this same paper he discusses the physi- 
ology of this progressive increase in carbon dioxide 
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discharge in continuous muscular work and mentions 
his intention of treating the subject in greater detail in 
a later paper, This apparently was never done. Waller 
himself regarded the phenomenon as a progressive in- 
crease in the energy cost of continuous work, although 
other physiologists have denied the physiological effect 
and expressed doubt concerning his explanation. 

Waller believed that his method could be used in 
estimating energy cost of all types of muscular work 
from the heaviest to the lightest. Numbers 581, 583, 
589, 601, and 603 of our bibliography are all reports 
of studies of heavy work such as dock labor and work 
in cold-storage plants. In these articles will be found 
repetitions of his descriptions of method and computa- 
tions. His contention that energy cost of work is in- 
creased at the close of a long spell of uninterrupted 
work was supported in his observation (589) that the 
cost of walking a given distance was doubled at the end 
of a working period as compared with the beginning 
of the same period. 

Studies 586, 591, 597, and 600 of our bibliography 
are all concerned with the energy cost of walking and 
marching. There are further explanations in these 
references of his experimental and computation meth- 
ods. 

Studies 592, 593, and 594- are concerned with the 
physiological cost of work in various departments of a 
printing plant and Number 590 is on the physiological 
cost of tailors’ work. Curves are published showing 
very neatly the typical characteristics we have already 
noted in preceding paragraphs. 
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Other miscellaneous studies are Numbers 588, m 
which bicycle ergometry is shown to be more costly 
than stair-case work, and 599, in which the relative cost 
of swimming at different rates U determined. 

Criticism and Defense of W alter Method, Hill and 
Campbell (291 ) criticize the Waller Method of de- 
termining energy cost in industrial work and cite an 
experiment in which contradictory results would be 
obtained by using the Waller Method instead of the 
complete Douglas-Haldane computation. They re- 
gard this discrepancy as evidence that the respiratory 
quotient must always be computed in work experiments 
and never assumed. These authors even go so far as 
to express skepticism over Waller’s experimental find- 
ing that there is a steady rise in carbon dioxide output 
during continuous work. About the same time, Orr 
and Kinloch (453) added a further criticism of the 
' neglect of the respiratory quotient in the Waller 
Method. Waller answered these two critical papers 
together, but we shall consider critical attacks by other 
experimenters before turning to his replies. 

. Qairns and O’Brien (206) published in the following 
yea^ results of the only experiment ever performed in 
direct refutation of the validity of his method. These 
investigators ran several types of experiments and made 
a serious effort to check the various claims made by 
Waller concerning his technique. These authors evi- 
dently anticipated that their results would permanently 
and effectively stop all further experimentation with 
the method, and indeed they were right, since their 
work was published some months after Waller’s death 
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and thus at a time when the method had no interested 
champion to defend it. The results obtained by Gairns 
and O’Brien certainly indicated a probability that the 
Waller Method cannot be as widely and as casually 
used as its originator thought possible, although we be- 
lieve that the evidence which will be presented in 
succeeding sections of the present discussion will in- 
dicate that the situation is not so hopeless as these 
authors have supposed. 

Greenwood and Newbold (232) are also severely 
critical of the method. They review important experi- 
ments utilizing the method and criticize especially 
Waller’s advocacy of using carbon dioxide measure- 
ment alone. Their paper consists partly of a statistical 
treatment of Benedict and Cathcart’s monograph 
which demonstrates the wide fluctuation of carbon 
dioxide output, and hence its low predictive value. 
These writers, moreover, even go on to demonstrate 
statistically that determinations based upon oxygen con- 
sumption or even upon oxygen and carbon dioxide to- 
gether are but little more reliable than experiments 
based on carbon dioxide production alone. They con- 
clude that “the practical conclusion seems, therefore, to 
be that, when any experimental calibration of different 
forms of muscular work is based on the confrontation of 
small sample (respiratory) measurements upon dif- 
ferent subjects, only the roughest results are obtain- 
able.” Their rather involved arguments from 
regression equations, variation coefficients, etc., lead 
them to the conclusion that even under the best of 
present methods we can do little more than distinguish 
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between the energy costs of sedentary occupations and 
heavy manual labor. Pickworth (465), as we have 
already noted, found that the Douglas Bag Method 
yields basal metabolic rates which are too high by as 
much as 20 per cent, so if full dependence is to be 
placed upon the two studies last named we should be 
forced not merely to reject the Waller Method and 
other simplifications but would also be denied even the 
use of the standard Douglas bag technique, which has 
found world wide use in countless experiments. 

Most of the criticism we have been considering has 
been based on statistical treatments of experiments in 
which oxygen consumption and carbon dioxide figures 
are available, so that results may be computed by the 
full method as well as by means of Waller's simplifica- 
tion. There are other criticisms of a more theoretical 
sort, particularly with regard to reliance upon carbon 
dioxide. The question is a live one in basal metabolic 
rate determination as well as in computing energy costs 
of work. The typical criticism of the use of carbon 
dioxide alone in respiratory exchange experimentation 
will be found in DuBois { 172) . 

Any unbiased reader who goes through the various 
criticisms we have just mentioned will inescapably con- 
clude that the Waller Method is utterly useless in any 
kind of experimentation and is even likely to doubt the 
yalue of any type of respiratory exchange determina- 
tion even under the most refined techniques. Waller 
himself lived only to answer the early objections of 
Hill and Campbell, quoted above, and those of Orr and 
Kinloch. Three principal objections were raised by 
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these authors, each of which is discussed quite com- 
pletely by Waller (59S). The first objectioa was that 
one-half minute is too short a sample for reliable re- 
sults, the second that the neglect of the food factor 
introduces serious error, and the third that the respira- 
tory quotient must be determined rather than assumed. 
Waller makes answer to all of these points and shows 
also the advantages inherent in his method for actual 
study in industry. We shall not repeat his arguments 
here but shall add further evidence on these various 
points as gained from other authors. 

Waller was able to give his method empirical 
confirmation in the above article by reports of minor 
experiments with parallel Waller and Douglas-Hal- 
dane computations. A study of marching (598) in 
which energy consumption was estimated by both 
methods demonstrated a reasonably accurate correla- 
tion between the Waller Method and the full Douglas- 
Haldane procedure. 

It appears to the present reviewer that much of the 
criticism which has been directed toward the Waller 
Method would lose force if the claims for the method 
were made a little less pretentious. A considerable 
body of evidence has accumulated to support the view 
that the physiological adjustment to light work is an 
altogether different matter from the adjustment to 
heavy work, and that, while the Waller Method may 
be demonstrated to be unreliable for heavy work de- 
terminations, such demonstration does not invalidate 
its use in connection with continuous light work such 
as is ordinarily found in American industry. Modern 
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engineering has so reduced the heavy manual labor in 
American industry that we are not concerned so much 
with differences in energy cost of various methods of 
lifting, pulling, and pushing heavy loads as we are with 
differences in the adjustment of, the work bench and, 
factors of a like nature. It is useful, then, for us to 
consider separately the physiology of light and heavy 
muscular work, knowing that the Waller Method or 
any other technique which may be applied to light 
work loses little in utility even if it cannot be applied to 
heavy manual labor. 

Hill (284), in a discussion on page 16, apparently 
confirms the feasibility of using carbon dioxide output 
when studying long periods of light work, and Viale 
(575 and 577) found the carbon dioxide discharge pro- 
portionate to the external work provided the task was 
light and continuous, He and Briggs (112) both speak 
of , an individual maximum carbon dioxide discharge 
which is not exceeded no matter how greatly the ex- 
terntil work is increased and Briggs even finds that 
"wheh exertion of steadily increasing magnitude is un- 
dertaken the expii'ed carbon dioxide percentage first 
rises aiid then falls.” 

Other important physiological studies of the dif- 
ference between light and heavy work are those of 
Hough (298) and Hill, Long, and Lupton (285). In 
this latter study again there are indications that carbon 
dioxide output may be used as an index of moderate 
exercise but not for heavy labor. Becker and Olsen 
(39) even find that (page 195) “in the case of minor 
muscular work, when the subject takes up the same 
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position during the entire experiment and works with 
unhindered respiration, the value of the increase of 
metabolism may approximately be expressed by the in- 
crease of the respiratory volume,” This they hold to 
be true because the carbon dioxide percentage of 
alveolar air under such circumstances remains prac- 
tically constant. 

Sargent (497) demonstrates that the recovery rate 
following severe exercise is highly variable and also 
points out that there is an actual elevation in the basal 
metabolic rate following such exercise. Both these 
factors would tend to invalidate computations of energy 
cost of heavy muscular work by any of the respiratory 
exchange methods. The experiments of Campbell, 
Douglas, and Hobson (121) are useful in connection 
with the theory underlying the Waller Method and 
their results also appear to be negative to that method 
for use with brief periods of heavy work. In spite of 
these theoretical objections, however, experimental de- 
terminations of energy cost of heavy manual labor are 
being made on the basis of carbon dioxide production 
and are being reported in the journals. See, for ex- 
ample, the study by Gullichsen and Soisalon-Soininen 
(236). 

Assuming that the references quoted in the last few 
paragraphs demonstrate the necessity of a separate 
treatment of the physiology of light and heavy muscu- 
lar work and that respiratory exchange determinations 
adaptable to the one may be unsuited for the other, we 
may now consider separately the various objections 
raised to the Waller Method with the idea in mind that 
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its application will probably need to be restricted to 
light industrial work and should not embrace all types 
of manual labor as originally advocated. 

Let us consider first the validity and reliability of 
taking a brief respiratory sample. Respiratory sample* 
of much shorter duration than those employed by Wal- 
ler were used in an experiment by Krogh (349) in 
1913. “In the series of experiments on the initial stages 
of muscular work we found it necessary to make three 
distinct determinations of ventilation and respiratory 
exchange during the first minute of work. . . , The 
accuracy of such a respiration experiment lasting gen- 
erally from three to six seconds is not great, certainly. 
The error may easily amount to ten or even twenty per 
cent, but we have found the method efficient for our 
purposes and no other appeared possible.” He refers 
again to these experiments in his manual written in 
1916 (354). “With violent exercise a bag taking 60 
liters will not hold the air expired during one minute, 
but it has been shown (1913) that experiments of much 
shorter duration are sufficient to give perfectly reliable 
results.” He asserts again (352) “that one can secure 
full and dependable estimation of an individual’s rest- 
ing metabolism with experiments of very short dura- 
tion.” 

Bedale (41) contends that accurate determinations 
can be made from samples collected for less than one 
minute and, furthermore, that it is thus possible to ob- 
tain true samples of the working period, She deter- 
mined oxygen consumption in her experiments, how- 
ever, rather than carbon dioxide output as in the Waller 
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Method. Aitken and Clark-Kennedy (5, 6), in 
very recent investigations, have demonstrated the pos- 
sibility of making reliable studies even of the various 
parts of a single respiratory cycle. 

One of the alleged dangers of using short respiratory 
samples is the possibility of securing artificially high 
carbon dioxide output through over-ventilation or 
Auspumpung . King and Cross (323) conducted a care- 
ful investigation into the extent of such possible wash- 
ing out of carbon dioxide and found that this can occur 
only once in successive experiments, so that if two de- 
terminations are made one can detect over-ventilation 
through failure of the two sets of data to check. This 
refers, of course, to the use of carbon dioxide determin- 
ation in finding basal metabolic rates, but there should 
be little difficulty in working out similar check experi- 
ments when determining energy metabolism. It is 
Waller’s contention that an experienced operator can 
detect Auspumpung through careful observation of the 
subject, and he asserts further that most subjects soon 
overcome any tendency toward forced breathing. Sub- 
jects who are unable to overcome this tendency would, 
of course, need to be excluded from experiments. 

Henderson and Haggard (265) report that trained 
athletes do not blow off excess carbon dioxide even in 
maximum exertion or afterward, but that this is not 
true of untrained subjects. This fact should be of con- 
siderable significance since most of the subjects upon 
whom the Waller Method would be used would be 
likely to be in at least a more than average state of 
training. 
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King has done still other work (see 323 above) in 
support of the use of carbon dioxide in basal metabolic 
rate determination. Such studies have at least an in- 
direct bearing upon the use of carbon dioxide figures 
alone in work experiments, and arc thus worthy of no- 
tice here. In one study (322) he demonstrated from 
the figures of Benedict and Carpenter, and Soderstrom, 
Meyer, and DuBois that there is a somewhat higher 
coefficient of correlation between carbon dioxide and 
measured calories than between oxygen and measured 
calories. In his manual (321) there is a further de- 
fense of the carbon dioxide method, which he believes 
to be at least as accurate for basal rate determinations 
as determinations based upon oxygen consumption. 
McKinlay (421) also experimented with carbon di- 
oxide methods in basal rate determinations, finding in 
26 cases that the basal metabolic rates varied 3 per cent 
on the average from those computed by the full method 
' and that in 33 other Cases the oxygen consumption com- 
puted from carbon dioxide output differed by 4.8 pec 
cent from the' measured oxygen consumption of the 
same cases. , The greatest fluctuation was found with 
diabetic patients and certain other pathological cases. 

■ and Bazin (473) have also claimed 

: eminently satisfactory clinical results from the use of 
carbon dioxide determination alone, and still further 
d-ispus^ion of the usefulness of the carbon dioxide de- 
tertiiihatipns in, the clinic will be found in the contri- 
butfoiis by Sainton and Peron ,(49S). 

Hendry, Carpenter, and Emmes (268), however, 
find some objection to reliance upon carbon dioxide in 
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the clinic and recommend the use of face mask and 
respiratory valve to reduce the tendency toward wide 
fluctuation found in certain subjects. 

Turning now from studies bearing on the use of 
carbon dioxide determinations in the clinic, we find 
that such measurements have had a long and honorable 
history in experiments in working metabolism. Dis- 
regarding the very early experimentation on the part of 
Lavoisier and his successors, we may note that the early 
tradition of the Swedish school, for example Johans- 
son (310), and Johansson and Koraen (311), in using 
carbon dioxide alone in the study of positive, negative, 
and static work, has been carried on down practically 
to the present time. 

Another early study is that of Higley and Bowen 
(278) who found that “the output of carbon dioxide 
from the lungs is practically uniform from minute to 
minute during uniform muscular work, if the blood 
has had time to take part fully in the process of elimina- 
tion.” They mention further that upon cessation of 
work the output of carbon dioxide decreases to the 
normal amount in about two minutes but not until after 
a latent period of about 20 seconds. Since the samples 
taken under the Waller technique are collected within 
the first 20 to 30 seconds, their findings would seem to 
support the use of this method. Krogh (352) also 
approves of the use of carbon dioxide alone in Douglas 
bag experiments in which simplicity and saving of time 
are of greater importance than extreme accuracy. 

Benedict and Johnson (77) used and described a 
large chamber for the studying of groups of thirty to 
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forty persons by means of carbon dioxide elimination 
alone. “The determination of the carbon dioxide pro- 
duction has of itself but little value, but is used for 
computing the heat production by means of the calori- 
fic value of carbon dioxide at an assumed respiratory 
quotient. Such a method is admittedly not so accurate 
as the direct measurement of the heat production or 
the computation of the heat production from measure- 
ments of the oxygen consumption. It is, however, a 
rapid method and not too costly for determining the 
approximate heat output of a group of people.” Their 
measurements, taken two hours after a light breakfast, 
indicated clear separation between occupations involv- 
ing various degrees of muscular activity. For example, 
there was a 3 per cent increase in metabolism while 
reading aloud as compared with silent reading and a 
13 per cent increase in metabolism in light hand-sew- 
ing as compared with reading quietly. Boigey (95 and 
97) is a well-known French physiologist who has suc- 
cessfully employed methods similar to the Waller 
Method in experiments on muscular exercise. He re- 
ports that “the variations of oxygen consumption and 
of carbon dioxide production are each proportional to 
the intensity of the work." 

In 1922 ( 94) he wrote an enthusiastic account of the 
use of the Waller Method in appraising the energy 
cost of different types of athletic sports. Meters and 
bags are pictured and described, as is also the special 
carbon dioxide analysis apparatus in portable form, 
as devised and used by Waller. He agrees with Waller 
in discounting the effect of the respiratory quotient as 
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a matter of practice in the experiments for which the 
method was intended. 

An experimental investigation of the relation be- 
tween gaseous exchange and respiration in severe 
muscular work is reported by Herxheimer and Kost 
(272) who report that carbon dioxide production is 
proportional to the ventilation but that this is not true 
of the oxygen consumption. 

The reader who has covered the material on the 
respiratory quotient, as outlined in the section under 
that heading, will realize that the modern conception 
of the respiratory quotient is somewhat different from 
that which prevailed at the time of the criticism of the 
Waller Method because of its neglect of the R.Q. 
More is known now about the ratio ordinarily to be ex- 
pected, both as to its stability, and the range within 
which it may fluctuate. Along with this greater empiri- 
cal knowledge has come a less dogmatic view of the 
theoretical significance of the ratio. 

The subject is so complex that it is impossible to 
prove or disprove any point on the basis of one set of 
experimental data alone. For example, there are 
studies such as that of Benedict, Emmes, and Riche 
(74) indicating a rise in the R.Q. following a carbo- 
hydrate meal, against which may be set the assertion 
from Benedict and Murschhauser (79) that “the heat 
production per unit of work is practically independent 
of the taking of food,” Again, Carthcart and Burnett 
(133) report a rise in the R.Q. during work, whereas 
Henderson and Haggard (265) find the working R.Q. 
to be the same as in the preceding rest state. 
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The immense amount of work by Hill (281) and his 
collaborators, as well as the evidence of Krogh and 
Lindhard (356, 357) and other experimenters, 
establishes a strong presumption that the working R.Q. 
is fairly close to unity. It seems safe to assume from 
these studies and from the recent investigations of 
Bock, van Caulaert, Dill, Foiling, and Hurxthal (91) 
that if the R,Q. is treated as constant at about 95 there 
will be little likelihood of introducing serious error 
through failure to compute the R.Q. Smith (525) 
found the majority of R.Q's. between 80 and 95 with 
very few in which the deviations were extreme. 

There remain to be mentioned a few miscellaneous 
points both in favor of the method and in opposition to 
its use. Carpenter (124) found decreasing amounts of 
carbon dioxide during work periods, which was con- 
trary to the typical findings of "Waller, and Mague 
(552) contended that fatigue has little or no effect upon 
the respiratory phenomena caused by work, which is 
alsocontrary to Waller’s experimental results. Polokov 
. (468) , however, found a fairly significant rise in carbon 
dioxide output with the continuation of muscular work. 
This confirmation of Waller’s results is the more sur- 
prising on account of the impossibly bad technique 
used in Polokov’s investigation. 

A possible limitation is Seen in the study of Hind- 
marsh (294), in which reliance upon carbon dioxide 
records alone tended to obscure the lowering of basal 
metabolic rates found in warm climates. It is possible 
that certain minor factors of industrial interest might 
also be thus obscured. A more definite limitation in 
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the use of the Waller technique is indicated in the study 
of Schneider and Clark (504) , who found that carbon 
dioxide output is totally unreliable as a measure of 
work performed under low atmospheric pressures. 
Possible advantages of the technique, however, are seen 
in the studies of Griffith, et al. (233), who discovered 
the rather dubious advantage that carbon dioxide out- 
put is not subject to seasonal fluctuations but that 
oxygen consumption is, and of Benedict and Cathcart 
(72) , indicating that there is not as long an after-effect 
from work on carbon dioxide output as there is on 
oxygen consumption. This would appear to mean that 
tests using carbon dioxide alone could be terminated 
earlier. 

We have devoted considerable space to a considera- 
tion of the Waller Method because of its great poten- 
tial significance in industrial work. The author of this 
review feels that he has collected sufficient evidence to 
entitle the method to a completely new evaluation, on 
the grounds that there is ample evidence either to re- 
fute or else severely limit nearly all of the charges made 
against the method, provided its application is limited 
to the ordinary range of industrial labor. The tech- 
nique is of sufficient importance to demand that it shall 
not be overthrown on the basis of a single experiment, 
nor can it be established on such a basis. Similarly, we 
cannot expect either to destroy or to verify the method 
on purely theoretical grounds. There is too much to 
be said on each side. 

There are two important research projects in con- 
nection with the method which demand immediate at- 
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tention. One of these is the conducting of several 
types of exploratory experiments by independent in- 
vestigators, some of whom should perfornt experiment- 
ation in industry and some of whom should work in 
the laboratory. The establishment of the validity of 
a respiratory technique is too involved a problem to 
leave in the hands of a single research worker. The 
other type of work now needed is a pushing forward of 
the method of rechecking the statistics accumulated 
from other investigations, as has already been done to 
the detriment of the Waller Method by Greenwood 
and Newbold, quoted above, and to the advancement 
of the method by Waller himself and indirectly to its 
advancement by King, There is available a large 
literature which may be worked over, using both the 
Waller Method and the Douglas-Haldane computa- 
tions in order to determine the limits in accuracy to be 
expected from the former. Since this project will yield 
results irv a shorter space of time than, would be neces- 
sary in laboratory experimentation, the writer urges 
that some worker with adequate statistical training 
review the studies of muscular work listed in our biblio- 
graphy with this purpose in mind. Useful data will be 
found in early studies such as those of Douglas and 
Haldane (168) and Carpenter (124) and many other 
.studies preceding these. Tabulations of oxygen con- 
siumption and carbon dioxide output will be found in 
many of the Carnegie Institution publications such as 
that of Benedict and Murschhauser (79) . Even minor 
studies on pathology and special conditions, such as 
that of Peabody and Sturgis (461), will be found use- 



MEASURING HUMAN ENERGY COST IN INDUSTRY 


435 


ful in that separate carbon dioxide and oxygen data at 
rest and during exercise are supplied. The recent 
sources are fully as valuable, so the investigator should 
not overlook the studies by Bock, et al,, (91) which ap- 
pear to give some support to the Waller Method, as 
well as taking into consideration the large amount of 
modern work done in Germany by such men as Simon- 
son (519-522). 

Gas Analysis Under the W alter Method. We have 
seen already the desirability of experimenting to 
establish the limits within which the Waller Method 
may be applied to industrial problems. Anticipating 
that investigators will wish to carry out such experi- 
mentation in industry with an apparatus as simple, 
portable, and accurate as possible, we have collected 
information on a variety of methods of gas analysis 
from which choice may be made to suit local condi- 
tions and needs. 

One of the most promising techniques for the deter- 
mination of the carbon dioxide content of expired air 
depends upon the difference in electrical conductivity 
of a platinum wire surrounded by a gas mixture in 
which the proportion of carbon dioxide is variable. 
Daynes (156), in a mathematical paper, discusses 
the theory of a device constructed on the principle 
originally developed by G. A. Shakespear. Another 
illustration and description of the device appeared 
anonymously (12) in a note concerning the apparatus 
manufactured by Cambridge and Paul Instrument 
Company, London. In the form described two 
platinum wire spirals are suspended in gas mixtures 
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and placed on opposite arms of a Wheatstone bridge. 
The carbon dioxide content of the gas surrounding one 
of the spirals may be varied and, by noting the de^ 
flection of the galvanometer, readings of carbon di- 
oxide percentage may be directly made. This original 
device was intended for use in the flumes of boiler 
plants, but Hill (279), in 1922, sent a note to the 
Journal of Physiology to the effect that the instrument 
was being tested at the Physiological Laboratory ex- 
perimentally and at the Royal Infirmary of Manches- 
ter clinically in order to determine whether the device 
could be adapted to physiological use in measuring the 
carbon dioxide content of respired gases. 

The next important contribution to the method was 
made by Palmer and Weaver (459) who demonstrated 
the rapidity and accuracy with which simple gas mix- 
tures with only one gas variable may be analyzed, 
These workers did not use a calibrated galvanometer 
but determined their carbon dioxide percentages in 
terras of resistance necessary to bring the galvanometer 
needle to zero. Using this type of apparatus they were 
able to attain readings which usually checked to the last 
decimal with Haldane determinations. About the same 
time Knipping (336) advocated modifying the Sie- 
mens and Halske instrument for determination of 
nitrous oxide in such a way as to make possible deter- 
mination of carbon dioxide percentage in alveolar air. 
(The Siemens and Halske instrument is described by 
Palmer and Weaver, page 43.) Knipping (327) de- 
. scribes the mbdified Siemens and Halske device and 
•.advocates its use particularly in connection with 
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anaesthesia and other special clinical purposes. In the 
meantime Rabinowitch and Bazin (475) described 
their use of the thermal conductivity method in deter- 
mining basal metabolic rates. They appear to have 
used the Cambridge Instrument Company’s device. 
Remarkable correlations with basal metabolic rates de- 
termined by the Tissot-Haldane Method were reported. 

Ledig and Lyman (370) have recently described the 
construction and operation of an apparatus of this type 
capable of recording carbon dioxide accurately to .01 
per cent and oxygen to .02 per cent, thus satisfying all 
the needs of respiratory work. The apparatus is costly 
and delicate, however, and is made so principally on 
account of the oxygen analysis feature. A slide wire is 
used to bring the galvanometer to zero, as an improve- 
ment upon the direct reading galvanometer. Techni- 
cal details of construction are given, such as methods 
of holding the electrical current constant and maintain- 
ing constant temperature around the gas cells. 

Illustrations of electrical conductivity devices and 
descriptions of their use will be found in Knipping and 
Rona (343), pages 171-175. The Cambridge Instru- 
ment Company Catalog, No. 5 (119) describes their 
Portable Carbon Dioxide Indicator for Alveolar Air 
on page 15. This is developed from the early Shake- 
spear type, in line with suggestions made by A. V. Hill. 
It is calibrated from 0 to 10 per cent, with an accuracy 
of one-tenth of one per cent. Electrical carbon dioxide 
meters arc made for engineering use by the Brown In- 
strument Company (114) and by Leeds and Northrop 
(372) . In a private communication, the Brown Instru- 
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ment Company asserts that a model can be made at 
reasonable cost with an indicating scale from 0 to 10 
per cent, at an accuracy of one-half of one per cent. 
The carbon dioxide meters supplied by the Brown 
Instrument Company and by Leeds and Northrup can 
both be made continuously recording if desired. Wil- 
lard and Smith (612) describe a compound com- 
mercially known as Dehydrite, which is used by Ledig 
and Lyman in their Katharometer. 

Other Simflified Methods of Gas Analysis. Efforts 
to simplify gas analysis have taken many forms and 
analysis instruments of several widely different types 
are available. We have already considered the use of 
electrometric methods of determining carbon dioxide 
percentage, employing Shakespear’s Katharometer in 
one form or another. An. electrometric method of an 
entirely different sort is described by Bayliss (37). 
'‘The method is chiefly valuable for determining the 
amount of carbon dioxide absorbed by a caustic soda 
solution and it is believed that it forms the best and 
quickest method for rapidly absorbing and measuring 
relatively large quantities of this gas.” Penn (191) 
also describes a somewhat similar method based upon 
measurements of the conductivity of carbon dioxide 
dissolved in barium hydroxide. It is possible with 
this method to detect 1 x 10"”' of a gram of carbon di- 
oxide, Applications of conductivity methods of 
determining carbon, dioxide in solution are also discus- 
sed by Nicloux (449). These methods did not appear 
easily adaptable to industrial use, so no attempt has been 
made to cover the literature concerning them. 
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In a preceding section we mentioned certain of the 
simplifications which had been proposed for use in con- 
nection with some form of Haldane gas analysis, There 
still remain two or three other modifications which 
must receive our attention. 

Boigey's account (94) of Waller's portable gas an- 
alysis apparatus for the measurement of carbon dioxide 
in gas mixtures has been quoted above. He has written 
another article (95) which appears to deal with simi- 
lar topics, but which was unavailable to the reviewer. 

Another very simple type of apparatus is described 
by Abady ( 1 ) on page 381 of his Gas Analyst's Manual. 
Oechelhauser's carbon dioxide apparatus is there de- 
scribed as a simple potassium hydroxide absorption 
arrangement which apparently can be read to about 
one-tenth of one per cent. There appears to be a pos- 
sibility of serious error, however, in the lack of pro- 
tection of the gas mixture from the potassium hy- 
droxide prior to taking the initial manometer reading. 
Another drawback is the evident necessity for using 
large samples. 

King (322) and Wardlaw {607a) are in favor of an 
open circuit method in which the carbon dioxide is col- 
lected in soda lime and its amount determined gravi- 
metrically. Krogh (354) also states, page 43, that 
“when a gas analysis is considered a thing to be avoided 
the contents of the (Douglas) bag can be taken through 
a Haldane set of vessels for absorbing water vapor and 
carbon dioxide and the total carbon dioxide determined 
by weighing." The most evident drawback of gravi- 
metric methods is the fact that a delicate and expensive 
balance is required. 
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Melka (423) describes a simple apparatus to analyze 
respiratory gases, but the journal in. which this is de- 
scribed was unavailable to the reviewer. Boimhiol 
(108) describes an open circuit arrangement, known as 
a “Sphyximetre,” which is intended to measure the 
oxygen consumption of the parts of a single breath. 
Two other interesting devices are those of Guillaume 
(235), who describes a spectroscopic method of gas 
analysis for clinical use, and Griffiths (234), who de- 
scribes an elaborate and costly instrument for analyzing 
gases through utilizing the difference in the rate at 
which sound passes through them. The device is not 
likely to be of service in respiratory metabolism investi- 
gations, as the results are of a low order of accuracy, 
and the special features of the method are of little use 
in physiological experiments. 

Other Open Circuit Methoiis of Metabolic Rate 
Deiermination. Thire are several types of open circuit 
methods of determining metabolic rate which have not 
yet been considered in. these pages, as well ns certain 
forms closely related to the Haldane methods which 
have already been described. Of the latter type is the 
old Tissot Method (562) and the majority of the 
modern methods described by Klein and Steuber (325) . 
Here belongs also the account by Labbe and Stdvenin 
(359) of basal metabolism determination, using the 
Laulani^ gas analysis apparatus in conjunction with a 
Verdin spirometOf, which is a dry gas meter calibrated 
decimally in liters. These same French gas analysis 
instruments are also described by Amar (10), along 
with the description of other French types of analysis 
apparatus. 
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An important historical document, but one not likely 
to interest the industrial worker, is the early mono- 
graph of Benedict (49) on “The Composition of the 
Atmosphere with Special Reference to Its Oxygen 
Content.*' There is a description of the Sonden-Pet- 
terson apparatus, which is an elaborate and sensitive 
development from the original Orsat principle. The 
chemistry of the method is discussed and other informa- 
tion, such as a history of air analysis, is supplied, as 
well as complete data on determinations made near the 
laboratory, over the ocean, and on Pikes Peak. 

Other gas analysis methods for use under special con- 
ditions are those of Gappellen and Noyons (207) and 
Suffle, Hoffman, and Walz (549). 

Turning now to open circuit methods in which the 
respiratory exchange is determined while the experi- 
ment is in progress, we find an interesting variety of 
ingenious arrangements for recording either the car- 
bon dioxide production or the oxygen consumption or 
both. 

Gesell and McGinty (213) constructed a very elabo- 
rate apparatus employing two different principles in 
determining respiratory exchange and in recording on 
smoked paper the variations in percentages of the two 
gases concerned. In determining percentage of carbon 
dioxide use was made of the fact that the acidity of a 
water solution varies with the partial carbon dioxide 
pressure with which it is in equilibrium. This dif- 
ference in acidity was measured with a potentiometer 
by means of an ingenious recording arrangement. The 
oxygen was recorded continuously through utilizing 
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the fact that a Bunsen burner flame temperature varies 
with the percentage of oxygen present in the air sup- 
plied. The flame was made to play on a current of 
water, and records were made by use of a thermo- 
couple vessel and a potentiometer. The apparatus as 
described was used successfully in animal experiments. 

The description of the apparatus developed by Me- 
Clenden, Anderson, Steggerda, Conklin, and Whitaker 
(41S) has been abstracted in Physiological Abstracts, 
Volume 13, No, 2927: “An entirely self-contained 
spirometer is described which resembles in principle 
a Haldane gas analysis apparatus, in which the sub- 
ject's lungs form the oxygen absorption chamber and 
a soda-lime tower absorbs the carbon dioxide. Of the 
two spirometer domes (each 100 liters), one is ini- 
tially filled with carbon dioxide-free air and the other 
empty. The subject inspires from the former and ex- 
pires into the latter by virtue of suitably arranged 
valves for 10 to 15 minutes. The air is then forced 
back through the soda-lime into the first spirometer. 
From the known volumes of air expired, and of the 
.latter after removal of carbon dioxide, respiratory 
quotient and oxygen consumption rate are obtained. 
Constant temperature is maintained throughout by im- 
mersing the apparatus in a large bath of water. The 
volumes may be read to 10 cc., and buoyancy errors 
eliminated by an automatic chain device.” 

A standard open circuit method widely used in Ger- 
many is that which utilizes the Zuntz-Geppert appara- 
tus, An early description of this apparatus was writ- 
ten by Magnus-Levy (398), and a recent description 
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of a modern form by Paechtner (457). Zuntz (623) 
describes a semi-portable form used on the continent, 
the apparatus being adapted for use in studies of work 
metabolism. It consists of a dry gas meter strapped to 
the back with a special arrangement for taking a series 
of minute samples and collecting them within a single 
sampling bottle. 

Regelsberger (480) describes a complex electrical 
and chemical apparatus making it possible to record 
carbon dioxide automatically with an accuracy within 
one-tenth of one per cent. 

One of the most interesting open circuit arrange- 
ments is that of Hanriot and Richet (247). This very 
early form of apparatus has also been described by 
Higley and Bowen (278), from whom our description 
was taken. Hanriot and Richet used an inspiratory 
gas meter, an expiratory gas meter, absorption appara- 
tus for carbon dioxide, and, finally, a third gas meter. 
A moment’s reflection will show that both carbon di- 
oxide and oxygen volumes can be derived from the dif- 
ferences in the three volumes registered. Krogh (354) 
says of the method (page 46), “The experiments made 
by Hanriot and Richet are not particularly accurate, 
but in the writer’s opinion there is no doubt that the pos- 
sibilities of this method are great. With modern gas 
meters of sufficient size placed in one water bath, vol- 
umes can be measured accurately to 1/10,000, and ar- 
rangements could easily be made giving a continuous 
graphic record of ventilation, oxygen absorption and 
carbon dioxide output.” In the light of modern knowl- 
edge it would also seem necessary to add a blower or 
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similar device to aid the lungs in driving the respired 
air through the three gas meters and absorption 
bottles* 

Total Pulmonary Fenlilation as an Index of Energy 
Consumption in Light Work. In our citation of vari- 
ous authors whose work seemed to support the Waller 
Method we quoted the assertion of Becker and Olsen 
(39) to the effect that pulmonary ventilation alone 
might suffice as an index of energy consumption, pro- 
vided the determinations were made at a time when 
the body was in a state of physiological equilibrium, A 
similar point was made many years previously by Han- 
riot and Richet (247). These workers found very 
little fluctuation in carbon dioxide percentage when 
the ventilation in liters ranged from 11 to 18 per min- 
ute. They definitely recommend ventilation determin- 
ation as an approximation method in light work 
experiments, although they mention that it is not suit- 
' able for<use in heavy work, 

: No one, apparently, has made a careful study with 
,the specific purpose of determining the validity of ven- 
tilation records alone as a substitute for more complete 
respiration experiments. The article by Kharina- 
■ Marinucci (320) was unavailable to the reviewer, so 
he does not know whether or not it may be a contribu- 
tion of this sort. There have, however, been several 
studies ip which the authors suggested the possibility 
of usihgtotal ventilation as an approximate index, and 
data are available in many other articles which may be 
used in checking the validity of the method as has been 
suggested in the case of the Waller Method. 
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Hansen (248) demonstrated a parallelism between 
carbon dioxide output, oxygen intake, and total ventila- 
tion in an experiment on the bicycle ergometer. Knoll 
(344) even gives consideration to the relationship be- 
tween respiratory rate and the volume of air expired, 
although there seems to be little likelihood that res- 
piratory rate can ever be developed into even an 
approximately reliable index. One of the best general 
treatments of total ventilation as an index of energy 
exchange is that of Magne (396). He discusses this 
subject in connection with his general account of res- 
piratory exchange during muscular work and quotes 
considerable tabular data from other experimenters. 

It must be recognized without question that many 
serious objections of a theoretical nature can be raised 
against the use of total ventilation as suggested, but 
there is evidence that it is at least more accurate than 
the use of loss of weight, and certain other factors 
which have been used in the past as an index of energy 
cost. All of the reasons which have been urged against 
the Waller Method may be put with even more force 
here, as well as a few special considerations applying 
only to this approximation method. The method, if 
developed for industrial use at all, could be used only 
to show relative differences, as there is but little likeli- 
hood that the results could be expressed in terms of cal- 
ories as Waller does in his computations. 

Boigey (97) has already expressed himself in oppo- 
sition to the use of this method, saying that “the total 
ventilation is not always in a.ccord with the quantity 
of oxygen consumed or carbon dioxide produced.” 
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Herxheimer and Kost (272) show a dose correspond- 
ence, in a recent experiment, between carbon dioxide 
output and total ventilation but a lesser correspondence 
between oxygen consumption and ventilation. This 
would be at least partly negative to the use of ventila- 
tion figures alone. The use of ventilation figures would 
also be contra-indicated under unusual atmospheric 
conditions and in the case of pathological subjects. 
McCann (413), for example, reports five cases of ad- 
vanced pulmonary tuberculosis in which the ventilation 
rate is double the ordinary rate. 

In keeping with our suggestion that the literature 
should be reviewed to discover the reliability and 
validity of total ventilation, we are enumerating a num- 
ber of studies arranged chronologically which appear 
to be especially useful for this purpose. In addition 
to the studies listed for use in verifying the Waller 
Method, the following investigations should be found 
Siignificant, 

Durig (177, 179), two studies in 191 1 in which ven- 
tilation, carbon dioxide, and oxygen data are supplied 
in full. Douglas and Haldane (168), a study in 1912 
giving detailed results of an experiment on walking. 
Their oxygen and carbon dioxide as well as total ven- 
' tllation figures may be taken to show the reliability of 
ventilation alone. This has been done by the present 
writer and one of the curves has been reproduced in a 
recent article (Page, 458). Liljestrand and Wollin 
(379), a study in 1913 supplying carbon dioxide tables 
and data concerning ventilation. Boothby (100), a 
study in 1915 seeming to confirm ventilation as an index. 
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Ilzhofer (301), a careful experiment in 1925, with full 
details as to ventilation, oxygen, carbon dioxide, and 
respiratory quotient. Simonson (519-522), a series of 
recent experiments in which ventilation figures are 
shown to have some significance in industrial investi- 
gations. Bock, el al. (90, 91, 556), a very recent series 
of painstaking and carefully controlled experiments in 
which a variety of figures are supplied, among them 
being complete information on total ventilation. 

Closed Circuit Methods for Determining Oxygen 
Consumption. An account of the theory and use of 
closed circuit methods of respiratory exchange deter- 
mination will be found in nearly all of the various man- 
uals which have been referred to in previous sections. 
Their importance is such, however, that we must refer 
specifically to several types of closed circuit appara- 
tus and review the literature more specifically than is 
done is most of the general references available in 
English. The most prominent of the closed circuit 
types of apparatus in use is the Benedict form in one or 
another of its many varieties. It is worth our while 
here to review the history of the Benedict apparatus, 
tracing the many stages of its evolution down to the 
present time. 

The old Benedict Universal Apparatus was described 
in detail by Benedict (47) in 1909. This is the original 
instrument from which the long series of modifications 
has developed. It was an elaborate and costly ap- 
paratus constructed so as to permit determination of 
carbon dioxide output and oxygen consumption. Then 
in 1916 Benedict and Tompkins (82) described the 
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Benedict Bed Calorimeter, a modification of the old 
Universal. The description of the calorimeter ap- 
peared in an article criticizing certain types of appara- 
tus then current and enumerating several desiderata in 
metabolic apparatus and methods. These authors in- 
sisted upon accurate respiratory quotients and freedom 
from factors inducing /iusfumpung and other disturb- 
ances of respiration. The Benedict Bed Calorimeter 
was said to embody these desirable features and to be 
sufficiently accurate for all clinical use. 

Following this we come to the historic reference (5l) 
in 1918, to the well-known Benedict Portable. While 
this device was intended to measure oxygen consump- 
tion alone, it is possible to determine carbon dioxide 
output by weighing the absorbing apparatus in a sensi- 
tive balance. The Benedict Portable was the first ma- 
chine of its general type to employ a fan or blower. A 
second, description of the apparatus and its method of 
, use was written the following year (78) , The next 
; year Benedict and Collins (73) described the first sim- 
plification of the original Benedict Portable. The 
piodification consisted of eliminating the three-way 
valve, the calcium chloride bottles, and certain minor 
parts and the assembling of the entire apparatus into a 
really portable unit. Complete instructions as to 
method of use were supplied in this article, as well as 
statistics demonstrating the validity of the apparatus. 

Ifi 1922 Roth (486-488) published a series of articles 
on a further simplification of the Benedict apparatus, 
He dispenses with the electric blower on the New 
Portable and substitutes flutter-valves of his own design. 
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He describes the use of the kymographic record in de- 
termining basal metabolic rates with apparatus of this 
kind, and supplies instructions for the use of his modi- 
fied form. Evidence is given that his modification is 
as accurate as the old and possesses obvious merits in its 
own right as well. This same material, with the addi- 
tion of certain charts and tables, was published in 
another source (48S) the following year. 

Sanborn, a manufacturer of metabolic apparatus, 
published information on the use of his form of the 
Benedict apparatus at about this same time. His book 
(496) comprises contributions from 21 workers in 
various metabolic fields, and constitutes a working 
manual for the use of the physician and technician. 

The well-known Student Apparatus was described 
in 1923 (6S), and a review (54) of all the types of 
apparatus used in the Nutrition Laboratory was writ- 
ten the following year. Chamber methods, portable 
apparatus, student apparatus, micro-respiration appara- 
tus, as well as ergometers and miscellaneous equipment, 
were all described in this contribution. It also included 
an extensive treatment of respiratory exchange, heat 
production, and the various factors affecting metabol- 
ism. 

Most of the references we have quoted so far are 
principally of historical interest to the industrial in- 
vestigator, but in 1925 Benedict (58) wrote a descrip- 
tion of a modification of the Benedict Student 
Apparatus which makes it possible to strap the device 
to a subject’s back in order to study his working meta- 
bolism. In the same article there is a description of a 
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simplified and greatly improved Benedict-Collins ap- 
paratus, This device is only semi-portable but is use- 
ful where extreme accuracy is necessary. No 
investigator who intends to study working metabolism 
should overlook this description of the portable appara- 
tus for work experiments, nor the theoretical discussion 
and comparison of open and closed circuit respiration 
apparatus which is also contained in this article, There 
is not much literature in English on the use of this new 
apparatus in industrial work, although the reader may 
be referred to an experiment by Benedict and Parmen- 
tcr (80) in which this apparatus was employed. A 
French description (59) appeared in 1927 in which 
photographs are shown of the apparatus in actual use. 

The latest modification described by Benedict (60) 
is that of a Field Respiration Apparatus for use in sur- 
veying racial differences in metabolism. It is a modi- 
fication of the portable form described in reference 58 
and “the apparatus has been so simplified chat it en- 
ables the determination of the oxygen consumption 
(apparent volume) of an individual 'wilh but one major 
measurement, i.e., the time required for the absorption 
of six pumpfuls of oxygen,” 

The Benedict Apparatus has been widely used In 
Europe and has been subjected to various modifications 
there, We may mention the recent contribution of Mul- 
ler (439) and the criticism and suggestions about hand- 
ling the apparatus contributed by Dautrebande (153). 

It must be noted, in closing our discussion of the 
Benedict Apparatus, that most of the older forms arc 
now obsolete, and that Benedict himself has stated that 
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the Nutrition Laboratory has now definitely abandoned 
closed circuit chamber methods in favor of the highly 
sensitive gas analysis technique for use with an open 
circuit chamber developed by Carpenter (122) in 1924. 

Another apparatus of considerable importance is the 
closed circuit wedge-spirometer, described originally 
by Krogh (349) in 1913. The apparatus has been 
given universal usage and was described ten years later 
in Danish (352), German (333), French (3S3), and 
English (330) . A receht discussion of the method has 
been contributed by Stolz (540), who describes certain 
modifications and improvements. Periera (463) de- 
scribes an experiment using the Krogh apparatus 
coupled with an open circuit arrangement for measur- 
ing the respiratory exchange by Haldane analysis. He 
succeeded in demonstrating that the oxygen consump- 
tion at rest while breathing air and while breathing 
nearly pure oxygen is substantially identical. Pre- 
sumably this finding would also hold in the case of 
Benedict apparatus and other closed circuit methods. 

A description of the Benedict and Krogh types by no 
means exhausts the list of closed circuit devices, as the 
following sampling will indicate. 

Hannan and Lyman (246) describe a rather large 
and cumbrous apparatus similar to the wedge-type 
spirometer of Krogh’s. The amount of oxygen con- 
sumed is obtained from the graph by plotting a straight 
line through the trend of the deepest expiration points 
and reading the fall in a given number of minutes. 
Another apparatus for measuring oxygen consumption 
graphically is the portable calorimeter described by 
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McClendon (414). Schadow (498) described a closed 
circuit apparatus without blower or valves in which a 
wid e-bore tube leads from a mouth-piece to a spi- 
rometer within which is located the soda-lime. He re- 
ports comparison experiments with the Benedict 
Apparatus and the Benedict- Knipping apparatus in 
which there was very close agreement. It was not 
recommended, however, for use in certain pathological 
cases nor in work experiments. Herxheimer (271) 
describes a closed circuit apparatus 8o simple that or- 
dinary laboratory tubing has replaced the usual wide- 
bore respiratory tubing, and the soda-lime is contained 
in an ordinary bottle with a rubber stopper. A mouth- 
piece and valves are used and there is no blower. Other 
simple types of- apparatus are described by Jones (313) 
and by Soto and Torino (531). In the latter device 
the subject breathes through a mouth-piece back and 
forth into the spirometer, 

„ , Many supplementary devices and modifications of 
various parts of closed circuit apparatus have been 
proposed. A few such suggestions may be enumerated 
hefe. 

Pierce (466) describes an integrating device for use 
with the Benedict Apparatus, eliminating the ratchet- 
wheel in connection with the spirometer. Guthrie 
(238), describes a convenient and accurate spirometer 
and also comments on absorption substances. “Differ- 
ent samples of soda-lime purchased in the open market 
show great variations in carbon dioxide absorption 
powers, some samples being several times more clficicnt 
than others.*’ Roth (484) also comments on absorbents. 
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pointing out that a good carbon dioxide absorbent may 
have a low moisture absorbing power, which would in- 
terfere with the efficiency of the apparatus* 

Moore (431) reports on a combination mouth- and 
nose-piece designed to take the place of the mouth- 
piece and nose-clip used on early forms of the Bene- 
dict Portable, and Hendry, Carpenter, and Emmes 
(268) show that the use of a mouth-piece with the 
Benedict Portable is superior to nose-piece or half- 
face masks. Knipping (338) asserts that the use of the 
kymograph is essential in all short-run experiments 
using the Benedict type of apparatus. 

Closed Circuit Methods for Determination of Oxy- 
gen Consumption and Carbon Dioxide Output. In 
the preceding section we traced the evolution of 
closed circuit methods from the early form in which 
oxygen consumption and carbon dioxide output were 
both determined, giving the impression that de- 
velopment had been entirely in the direction of 
simplification, and that closed circuit methods with 
provision for the determination of both gases should be 
considered obsolete. There have, on the contrary, been 
many modifications of the closed circuit principle 
which have retained the original feature of determin- 
ing both gases. 

Although the Benedict forms now in use are con- 
structed to measure oxygen consumption alone, the 
principal part of the many early studies of the Nu- 
trition Laboratory were made with some form of the 
old Benedict Universal Apparatus (47). A few 
samples of the more important of these studies are the 
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well-knowa investigations of resting metabolism by 
Benedict and Carpenter (70), and numerous studies of 
working metabolism, such as those of Benedict and 
Murschhauser (79) and Smith (525). 

One of the most elaborately constructed respiration 
calorimeters using the standard closed circuit method 
was that of the Russell Sage Institute of Pathology in 
Bellevue Hospital, described by Riche and Soder- 
strom (482). 

A new form of apparatus in which both carbon 
dioxide and oxygen determinations are made has been 
developed by Simonson (517). The issue of the 
journal in which this apparatus was described was un- 
available at the time of this writing, but it appears 
that the apparatus is a development of the Benedict 
form and is a portable instrument suitable for a variety 
of work experiments. 

Probably the most important work which has been 
done in developing the closed circuit principle for the 
determination of both gases is the extensive experi- 
mentation on the part of Knipping in Germany. We 
have listed a rather large number of his writings in 
spite of the fact that there is considerable duplication 
among them, hoping that the wider selection of jour- 
nals will make his material more generally accessible. 

Articles 329, 330, 332, and 340 in our bibliography 
each supply fairly complete information concerning 
his apparatus. He reviews all of the important 
methods in use on the continent (332), concluding that 
the Benedict Pprtable is the most useful on account of 
its simplicity, and the lowered possibility of error 
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which this brings about. The Sanborn type usually 
supplied, however, does not yield carbon dioxide rec- 
ords and so it is impossible to compute respiratory 
quotients. Knipping then describes his modification of 
the Benedict apparatus in which the carbon dioxide is 
absorbed in a special flask so arranged that the gas may, 
after the close of the experiment, be driven out of com- 
bination with the potassium hydroxide and back into 
the spirometer bell. This is accomplished by adding 
a quantity of IS per cent sulphuric acid to the potash 
solution in which the carbon dioxide has been ab- 
sorbed. Further descriptions of the apparatus are 
found in an article (340) which emphasizes the de- 
sirability of determining both respiratory gases, the 
necessity of eliminating back pressure of any kind in 
the breathing system, and the need for an arrangement 
for circulating the air through the spirometer rather 
than depending on a direct connection and pumping it 
back and forth by means of the subject’s own respira- 
tory movements. He mentions that Krogh’s apparatus 
provides for a circulation of air, but that it offers re- 
sistance to breathing and is liable to develop leaks. His 
own apparatus fulfills all three requirements, since 
both gases are measured. A blower is used to relieve 
back pressure, and the air is circulated as in the Bene- 
dict apparatus. He describes the simplicity of the 
-apparatus and speaks also of the large amount of ex- 
perimentation which entered into developmental work 
in overcoming technical difficulties with the use of the 
new flask. He claims ample accuracy for both carbon 
dioxide and oxygen and states that the apparatus is 
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adaptable either to long-time or short-run experiments 
on animals, children, or man. 

A brief description (339) of the flask which he adds 
to the Benedict apparatus is availablCi as well as an- 
other description (333) in more detail. It is described 
also in ati article (331) in which mention is made of the 
fact that the 47 per cent potash solution used is sup- 
plied by Merk, and that the flask is obtainable from 
Albert Dargatz, Hamburg I, Pferdemarket. In an- 
other article (329) mention is made of the fact that the 
complete apparatus, consisting of pump, spirometer, 
and special flask, may be obtained from this same 
source. An article in 192S (340) mentions that it is 
unnecessary to drive out the carbon dioxide by means 
of sulphuric acid for purposes of volumetric deter- 
mination, but that its amount can be determined very 
accurately while still in solution by means of a hydro- 
meter, There are further notes (328) on the use of 
the hydrometer or pycnometer in the determination of 
the specific gravity of the potash solution and hence the 
percentage of carbon dioxide absorbed therein. It is 
also possible to substitute a refractometer for the hy- 
' drpmeter in making this determination, 

A liter article (341) is particularly concerned with 
the mask to be used in connection with a closed circuit 
apparatus. Ktiippitig favors a special type of full- 
face, mask having an inflated fubber ring around the 
edge of a metallic cup. There is described in the same 
atticle a simplified form of the Knipping apparatus, as 
Supplied to the Iceland Expedition. 

Knipping (337) describes an alcohol-burning con- 
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trol device for use with his respiration apparatus and 
discusses the necessity of running control experiments 
with all apparatus of this kind. A complete account 
and description, with illustrations, of the various modi- 
fications proposed by Knipping, as well as a wide 
variety of other types of respiratory apparatus, will be 
found in the working manual on the technique of clini- 
cal gaseous metabolism determinations by Knipping 
and Kowitz (242). 

Developments from the Krogh apparatus permit- 
ting the determination of both oxygen and carbon 
dioxide have been described by Liebesny and Schwarz 
(378) and Melli (424). 

In a brief preliminary report (261) and a more com- 
prehensive account (260), Helmreich and Wagner de- 
scribe their difference-spirometer for the graphical 
determination of carbon dioxide and the respiratory 
quotient. Their apparatus is a novel and ingenious 
rearrangement of the usual closed circuit spirometer, 
making it possible to determine the respiratory quo- 
tient from the graphical record. Lehmann and Miil- 
ler (373) describe an elaborate form of closed circuit 
apparatus for determining both respiratory gases. It 
is a complex and expensive apparatus only semi-port- 
able. 

Graphical Methods in Respiratory Exchange De- 
termination. It should be obvious that many of the 
instruments, both of the open circuit type and the closed 
circuit type, are adaptable to graphic registration. In 
some cases the description of the apparatus indicates 
that continuous graphic records in a direct reading 
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form are produced and in other instances kymographic 
attachments are employed in recording respiratory ex- 
change in a somewhat more indirect fashion. To enu- 
merate the various devices in which graphic records 
are produced would, for the most part, be to relist a 
large number of the instruments already discussed in 
preceding sections. Such tabulation would include the 
continuous electro-metric methods of Gesell and Mc- 
Ginty (213) and the electrical and chemical apparatus 
of Regelsberger (480), as examples of true graphic 
methods, and the devices of Hannan and Lyman (246) 
and Liebesny and Schwarz (378), as examples of 
graphic modifications applicable to typical closed cir- 
cuit methods. 

Other closed circuit developments which we have 
not previously described arc those of Hagedorn (240) , 
Dethloff (158), and Burger and Dusser dc Barenne 
(116, 180, 181). These will be described below. 

Hagedorn has developed an elaborate apparatus and 
technique making it possible to record oxygen con- 
sumption and carbon dioxide output graphically. 
There are two gas meters mounted on a common shaft 
and operated as pumps by an electric motor; these con- 
nect to two spirometers each of which records graphi- 
cally on a drum, and the entire assembly must be 
operated under water to insure identical temperatures 
in the meters and spirometers, One spirometer regis- 
ters carbon dioxide, the other carbon dioxide and oxy- 
gen, from which the oxygen may be computed by 
difference. The arrangement proposed by Dethloff 
is a less complex arrangement which also records both 



MEASURING HUMAN ENERGY COST IN INDUSTRY 459 

gases graphically, and Burger and Dusser de Barenne 
describe three forms of their apparatus as adapted to 
three different ranges of ventilation rates. 

Graphically recording apparatus of still other types 
is described by Higley and Bowen (278) and Kleiber 
and Wirth (324). The former apparatus consists of 
a rather complex arrangement by means of which 
changes in the weight of the carbon dioxide absorbing 
apparatus, mounted on a balance, are recorded graphi- 
cally by the lever arm of the balance. Excellent curves 
produced with this apparatus are published in their 
article. The device of Kleiber and Wirth is a com- 
plicated but highly ingenious arrangement of pipettes 
and manometers, constructed so that records are made 
on a photographic strip and may be translated later in 
terms of carbon dioxide percentage. Either provisions 
or corrections are made for constant temperature, pres- 
sure, etc., so that the apparatus is fully as accurate as 
the usual volumetric methods. 

Breathing Apparatus for Respiratory Exchange Ex- 
periments. Several general manuals and comprehen- 
sive studies having to do with breathing apparatus in 
respiratory experiments have been mentioned in vari- 
ous preceding sections. Since the proper choice of the 
type of mask, mouth-piece, valve, or other minor part 
of respiratory apparatus is often of paramount import- 
ance, we shall, therefore, once more refer to the lead- 
ing sources of information for assistance in making 
such choice, and shall refer as well to certain special 
studies which we have not previously mentioned. 

A careful comparison of various types of valves, nose- 
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pieces, and similar apparatus, as well as a discussion 
of open and closed circuit methods, was written in 191S 
by Carpenter (123), and in 1919 Hendry, Carpenter, 
and Emmes (268) published the results of their ex- 
tensive investigations into every possible combination 
of open and closed circuit methods with mouth-pieces, 
nose-pieces, and half-face masks. Boothby and Sandi- 
ford (104) and Henderson (264) describe spirometers, 
valves, masks, and other equipment for open circuit 
methods, and Knipping and Rona (343) describe all 
such apparatus, as well as tread-mills, ergometers, and 
other accessory devices used in metabolism experi- 
ments. 

Dautrebande and Davies (154) investigated the vari- 
ations in respiratory exchange produced by wearing 
masks of different types, Newcomb (446), Rowe 
(490), and Knipping (341) describe masks especially 
adaptable for use in closed circuit arrangements. 
Bailey (27) speaks of the undesirability of mouth- 
pieces or nose-pieces and describes a full-size face mask 
for use with open circuit apparatus; and Rosenheim 
(484) speaks of the undesirability of mouth-pieces 
also, aiid proposes a fireman’s mask which he found 
especially suited for use with young women. Krogh 
(349) mentions using Stent’s substance with Bohr’s 
mask and filling the dead space in the mask with plasti- 
cene. Moore (431) proposed a combination mouth- 
ahd nose-piece and Williams and Nolting (614) 
desctib&a mouth-piecc constructed upon proper dental 
principles. 

Bailey (29) describes an efficient air valve having 
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very low resistance, and valves especially intended for 
use with the Douglas bag have been described by Ful- 
ton (200) and Raper (476). A light and simple valve 
made by Soderstrom is described in detail by McCann 
(413). 

Accessory Apparatus in Respiratory Exchange Ex- 
periments. The investigator in respiratory metabolism 
finds it necessary to select not only among various 
major types of apparatus and breathing appliances, but 
must al^o choose certain minor devices of an accessory 
nature. Mention has already been made in the pre- 
ceding Sections of information regarding sampling 
bottles, gas meters, carbon dioxide absorbents, ergo- 
meters, and such devices used in one way or another in 
respiratory exchange experiments. As a matter of con- 
venience we are assembling here certain of the more 
useful references to these various types of apparatus. 

Bicycle ergometers are described by Campbell, 
Douglas, and Hobson (121), Severinghaus (513), and 
Waller and DeDecker (588) . A useful form of tread- 
mill is described by Benedict, Miles, Roth, and Smith 
(78). Benedict (54) describes ergometers and related 
devices, and Lehmann (374) describes a special dyna- 
mometer and an arm ergostat. 

The Verdin spirometer is described by Labbe and 
Stevenin (359), the Boullite spirometer by Dreyer 
(171), and a special spirometer by Guthrie (238). 
Amar (10) also describes the Verdin spirometer as 
well as the spirometer of Tissot and many other forms 
of accessory apparatus, 

Gas meters are described by Boigey (94) and New- 
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comer (448). Hartwell and Tweedy (253) supply in- 
formation concerning maximum, average and mini- 
mum ventilation rates to be expected in respiratory ex- 
periments, which should be useful in the selection of 
gas meters, Dougl as bags, etc. 

Sampling bottles are described by Bailey (27, 28), 
and McCann (413), on page 852, speaks of a special 
sampling bottle in which glycerine and saturated so- 
dium chloride in equal parts are found to be as satis- 
factory as mercury as well as less expensive. 

Soda-lime is favored as a carbon dioxide absorbent 
by Kfogh (349), although Guthrie (238) speaks of 
the great variations in absorbing power of different 
samples of soda-lime obtainable commercially. Roth 
(484) tells of the difficulties of finding a carbon dioxide 
absorbent which also possesses a high moisture ab- 
sorbing power. Willard and Smith (612) describe a 
highly efficient drying agent. Knipping (341) sup- 
plies a footnote description of the method of preparing 
the potash solution for use in his apparatus. 

Benedict (55) describes a control device developed 
in the Nutrition Laboratory for testing all types of 
respiration apparatus for gas volumes, heat, valve 
leakage, etc. The apparatus is obtainable through W. 
E. Collins, 584 Huntington Ave., Boston. Knipping 
(337) describes a control device for use with his ap- 
paratus, and Seuffert and Nitschke (512) describe an 
experimental control for use with the Voit portable 
respiration apparatus. Goiffon (218) proposes an in- 
strument intended to standardize and simplify the 
taking of samples of alveolar air. 
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A number of the studies published by the Nutrition 
Laboratory carry descriptions of special devices used 
in metabolism experiments, and should be referred to 
by workers interested in such special types of equip- 
ment. As a single example we shall mention the electri- 
cal pulse recorder and the electrical pneumograph de- 
scribed by Benedict, Miles, Roth, and Smith (78) . 



Ill 

APPLICATIONS AND RESULTS 

Uses of Respiratory Exchange Determination in In- 
dustry. The (ietermin£Ltions have their widest applica- 
tions in the comparative study of energy costs when 
one major variable in the environment or the method 
of work is altered, and the others held as nearly constant 
as possible. In this way, certain optimum values may 
be discovered with respect to the adjustment of work- 
ing conditions and equipment; or the cost to the worker 
of various environmental conditions may be revealed. 
Studies toward determining the best adjustment of 
working conditions and equipment may be of two sorts : 
either an artificial situation set up in the laboratory 
for investigating certain generalized modes of work- 
ing or types of equipment, the results to be widely ap- 
plied industrially; or they may be actual field studies 
of a spegialized industrial operation, usually of more 
limited application. The two avenues of approach will 
be considered separately below. 

Laboratory Studies of the Gomponenis of Industrial 
W ork. The various types of muscular work ordinarily 
encountered in industry have been analyzed into their 
essential components and the energy cost of each com- 
ponent has been determined through respiratory ex- 
change determination. Generalized modes of working 
have also been investigated experimentally with a view 
toward determining optimum rates and most efficient 
arrangements of the mechanical system involved. 

One of the commonest devices used in laboratory 

[ 464 ] 
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studies of muscular work is some form of an ergometer. 
This apparatus is especially suitable for experiments in 
determining the effects of continuous work over periods 
of time and in the determination of optimum rates and 
loads. Benedict and Cathcart (72), using a bicycle 
ergometer, found indications of a loss of bodily ef- 
ficiency at the higher rates of work. Furusawa (202) 
also found that the oxygen required to perform a given 
amount of work varies markedly with the speed of 
work, although he discovered that with constant speed 
and varying load the oxygen requirement rises as a 
linear function of the work done, so that there is no 
optimal load. Hansen (248), also using the bicycle 
ergometer, found decreasing pulmonary ventilation 
with increasing rate up to 50 to 60 revolutions per 
minute. The oxygen consumption and carbon dioxide 
output were also minimum at this rate. Optimum loads 
were determined as a by-product of the study by 
Bedale (41), and were more directly investigated in 
a later study along similar lines by Cathcart, Bedale, 
Macleod, Weatherhead, and Overton (132). This 
study tended to confirm Bedale’s earlier conclusions. 
Bedale’s experiment will be discussed in a later para- 
graph, as the primary concern was not with rates and 
loads, but with the methods of carrying. Optimum 
rates and loads were also investigated by Jaschwili 
(307), and Lindhard (382) used a known optimum 
rate on the bicycle ergometer in. an experimental in- 
vestigation of muscular work of short duration. 

Other uses of the ergometer were made by Wang,' 
.Strouse, and Smith (607), who studied the, effect of 
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obesity and undernutrition on muscular efficiency and 
fatigue, and by Viale (573, 575, 577) in a series of 
experiments on energy consumption during muscular 
work. 

Benedict and Parmenter (80) and Waller and De- 
Decker (588) substitute stair climbing for the bicycle 
ergometer in work experiments. 

The most important studies in which various types 
of work are analyzed into their muscular components 
have been carried on in Germany, principally by Atzler 
and his associates, In one study (15), he reports a 
detailed investigation of crank turning, using the 
Krogh apparatus, and a similar study of weight lifting, 
using the Zuntz-Geppert apparatus. Ten generaliza- 
tions as to methods of work are presented as a result 
of his study. Some of these are obvious, such as the 
statement that one should use muscle groups adapted 
to the task, as locomotion by riding a bicycle is more 
efficient than by propelling a wheel-chair because the 
leg muscles are adequate to the task and the lighter arm 
muscles are not. Another rule is that through motion 
study we should not eliminate too many needless move- 
ments, as some of them may rest the heavily worked 
muscles and lightly exercise others. In fact, in light 
hand work, one should deliberately get up and walk 
around at times to stimulate the circulation. The work 
tempo should be alert, with rapid work and long rest 
pauses, but one should not speed up too greatly on 
heavy work. Static work and unnatural or strained 
body positions should be eliminated or reduced to 
minimum. Burdens should be carried so that the cen- 
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ter of gravity is vertically above the supporting part 
of the body. Clothes should be worn that do not inter- 
fere with movements of the body, ' Energy is wasted 
through wearing clothes that are too heavy. These 
experiments were reported originally in more detail by 
Atzler, Herbst, and Lehmann (24). Similar material 
will also be found in Atzler’s contribution (17) to 
Korper und Arbeit., A much earlier German investi- 
gation along similar lines is the study by Reach (478) 
of the muscular work of rotating the handle of a milk 
separator. 

Another type of investigation which has borne fruit- 
ful results in recent years is the experimental determin- 
ation of energy costs of carrying loads by various 
methods. Bedale’s study (41) was based on only one 
subject, but the technique of the experiment was care- 
fully worked out and the resulting curves showed de- 
cided differences in the economy of weight carrying 
by different methods. Hewitt and Bedale (274) re- 
ported an investigation which was preliminary in 
nature to this one. Klingendahl and Pesonen (326) 
conducted a similar investigation except that two sub- 
jects were used and the experiment was carried on in 
a respiration chamber. Of the three methods studied, 
weight carrying on the back was most economical, 
carrying alternately in each hand was least economical, 
and divided between the two hands was between these 
two in economy. This literature was reviewed by 
Atzler and Herbst (22) who contributed also their own 
Douglas-Haldane study of energy consumption in 
weight carrying on a horizontal plane. These two 
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authors have also reported a study {2S) of the economy 
of pushing and pulling loads on a horizontal surface. 
A number of useful optima were discovered. The 
article is reviewed in the Journal of the American 
Medical Association, Volurne 88, page 1609 and in 
Psychological Absiracts, Volume 1, No. 1154. 

Several rules of practice with respect to pushing and 
pulling movements of the arm were derived by 
Lehmann from his study (374) in which dynamometers 
and ergometers were used in conjunction with respira- 
tory exchange determination with the Benedict appara- 
tus. The physiological cost of the muscular move- 
ments involved in barrow work has been determined 
by Crowden (150), and the cost of the work of 
shovelling using shovels of varying heights and handle- 
lengths and with different loads has been investigated 
by Wenzig (610). As we have commented in an early 
section, one reason why studies of this general charac- 
ter have not been widely made in America will be 
found in our high proportion of mechanical handling 
of what would otherwise be heavy manual labor, 

A type of study which has attracted a great deal of 
attention from Scandinavian investigators is the Inquiry 
ihto the relationship between positive, negative, and 
static work. Johansson (310) and Johansson and 
Koraen (311) reported studies on this problem nearly 
thirty years ago. Johansson demonstrated an increase 
in metabolism in all three types of work, but showed a 
lack of correlation between sense of strain and actual 
metabolism. 'I'he question was worked upon later by 
Hammarsten (244), using Johansson’s apparatus, and 
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by Frumerie (199), who found no relation between 
carbon dioxide output in static work and the subjective 
feeling of fatigue. A more recent experiment was per- 
formed by Cathcart and Stevenson (137), using the 
Douglas Bag Method in studying the arm extensors 
and flexors separately in the three types of work. The 
total metabolism during negative work was about three- 
fourths of that during positive, and static work had a 
metabolic cost of about four-fifths that of positive work. 
The subjective fatigue experienced was greatest in the 
static work, however. These authors qualified some 
of their findings through certain theoretical considera- 
tions. 

The problern has been attacked more recently by 
Dusser de Barenne and Burger (182, 117). They sur- 
veyed the literature in the latter article and reported 
their own experiments in detail in the former. The 
earlier investigations were confirmed in that certain 
types of light static work were found to produce in- 
tense subjective fatigue without materially affecting 
the gaseous metabolism. The most recent of the 
Scandinavian work is that of Hansen, Hvorslev, and 
Lindhard (249), who find no difference in the force 
exerted in actively moving a weight and the static task 
of merely holding it. 

Studies which appear to pertain to this general field 
but which were unavailable to the reviewer are those 
of Abramson (4), Boigey (98), and Efimov (183). 
The first of these articles is said to describe Johansson’s 
apparatus for use in work experiments. The article by 
Efimov was referred to by Polakov (468) but has been 
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found to be unavailable in the large reference libraries 
of Chicago, New York, Washington, and Boston. 

Energy Cost of Industrial Occupations. Several ef- 
forts have been made to determine the energy cost of 
various industrial jobs, although the interest has usually 
been in assessing the physiological cost per clay in rela- 
tion to food requirements rather than in determining 
differences in physiological cost of the same job per- 
formed under contrasting working conditions. Studies 
of the latter type which are of special significance to 
the industrial psychologist and engineer will be given 
separate treatment and we shall record here investiga- 
tions having the former as their primary aim, 

Moss (436) computed the energy expenditure in 
various types of work in a coal mine, using a special 
modilication of the Douglas Bag Method. Green- 
wood, Hodson, andTebb (231, abstracted in 230) de- 
termined the total metabolism of female munition 
workers, engaged at various tasks and under diverse 
working conditions. A similar study, carefully carried 
out and reported in detail, is the investigation of 
energy expenditure and food requirements of women 
workers contributed by Rosenheim (484). Various 
kinds of farm labor have been investigated by Farkas, 
Geldrich, and Szakall (188), using the Douglas Bag 
Method, and a study of brick laying has been reported 
by Baader and Lehmann (26), using the Benedict ap- 
paratus. 

A composite study of an interesting sort was reported 
a number of years ago by Amar, although the results 
have little practical significance today. Amur studied 
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the movements involved in using a file and determined 
mechanically the most suitable pressure to use, the best 
body posture to assume, and similar factors regarding 
this occupation. He also determined the energy cost 
of filing, although it should be noted that he employed 
a very long tube leading from his subjects to the gas 
meter, which would seem to be rather bad technique. 

Those who are interested in learning the energy cost 
of various occupations may be referred to the work of 
the Pvoyal Society Food (War) Committee (492), and 
to Chapter X of the book by Collis and Greenwood 
(141), as well as to a number of standard textbooks on 
physiology and nutrition. A summary of the Scandi- 
navian work on determining the cost of industrial 
occupations in a respiration chamber, using carbon 
dioxide alone as an index, has been written by Becker 
and Hamalainen (40). 

Not much work has been done on the energy cost 
of the lighter industrial tasks, or cleric.al and other 
business occupations. Typewriting alone has been 
carefully studied. The reader may be referred to the 
work of Carpenter and Benedict (126) in 1909 and to 
the more recent work of Ilzhdfer (301), and of 
Schroetter (508). The latter appears to be a compre- 
hensive study but was unavailable to the reviewer. It 
is abstracted in Giese’s Methoden der Wirtschafts- 
psychologie , pages 50T502. 

The energy expenditure involved in light household 
work was studied by Benedict and Johnson (77), using 
a respiration chamber employing carbon dioxide de- 
terminations alone. About twenty subjects were used. 
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several occupations were studied, and each experiment 
was repeated two or three times. Clear differentiation 
between tasks was shown, A more recent study, more 
elaborate in certain respects although employing only 
one subject, has been reported by Langwortliy and 
Barott (363). This was abstracted in an article pub- 
lished the following year (36+) . It is of interest in- 
dustrially to note the significant dlfierences found in 
dishwashing at tables of various heights. A table 85 
centimeters high results in an energy expenditure of 
20 calories per hour above the resting requirement 
and the requirement is increased if the table is material- 
ly raised or lowered. At a height of 100 centimeters 
the requirement is 24 calories and at a height of 65 
centimeters it becomes 30 calories per hour, This 
material and other information from these authors was 
quoted by Wheeler (611) . . 

Several studies have appeared on the energy cost of 
various athletic sports. Swimming has been studied 
by Waller and DeDccker (599), who employed the 

■ Waller Method, and by Moog and Schwieder (430), 

■ who used loss of weight as an index. The latter ex- 
perimenters found that swimming 1000 meters caused 
as great a loss in water from the lungs and skin as 

.running 10,000 meters. Loewy and Knoll (384) 
studied the energy exchange in skiing, Geldrich (209) 
determined the energy consumption of horseback rid- 
ing various paces, Gullichsen and Soisalon-Soininen 
(236) :S.tudicd the carbon dioxide excretion in fencing 
and wrestling, and duBois-Reyraand and Pcltrct (99rt) 
determined energy costs of muscular exercise and ten- 
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sion ill gymnastic training. Oxygen consumption data 
for certain major sports are supplied by Knoll (344) . 

A few miscellaneous studies are the determinations 
made of the energy cost of organ playing by Farkas and 
Geldrich (187), the cost of playing ping pong by 
Blomberg and Johnson (86) and of miscellaneous 
minor tasks by Cathcart and Trafford (138), Cath- 
cart and Orr (136) employed the Douglas-Haldane 
method in determining energy costs of various military 
duties, computed by the hour, day, and week. Bedale - 
(42) made Douglas bag determinations of children’s 
activities at work and at play in a private school. 

"Energy Cost of 'PT diking. Such an imposing amount 
of work has been done on the physiological cost of 
walking and marching that it demands special treat- 
ment in a separate section. Studies have been carried 
on for the purpose of determining energy cost and food 
requirements, particularly of soldiers on the march, 
and on other phases of greater industrial significance 
such as optimum rate and effect of carrying loads. 

The classic experiment on the physiology of walking 
is that of Zuntz and Schumburg (624) in 1901. This 
study has been criticized in the light of modern physio- 
logical knowledge but received wide recognition at 
one time. Another early investigation was Durig’s 
study (177) in 1911. Durig determined the ventila- 
tion, carbon dioxide production, and oxygen consump- 
tion at various rates, finding some relationship between 
speed and cost but no evidence that slowing within 
reasonable limits increases costs. These and other 
early studies were reviewed by Benedict and Mursch- 
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hauser (79) as a part of their study in 1915, They re- 
ported detailed results on standing, sitting, walking, and 
running at various rates, with and without food. The 
food factor was found to have but slight relation to heat 
production per unit of work. This study was supple- 
mented by another typically thorough Nutrition Lab- 
oratory investigation carried on by Smith (525) . Smith 
also found that energy cost is not much affected by rate 
until a rather high speed is reached. 

A distinction is usually made between the acts of 
walking and marching, although under service condi- 
tions it is often not clear that there is a physiological 
difference, Marching might be merely a special gait 
of walking or it might be walking under load or a 
combination of these two conditions. We shall notice 
a few of the experiments designated as studies of 
marching. 

Waller has used his methods in several studies of the 
cost of marching under various conditions. In one 
study (586) interesting data on marching costs at var- 
ious speeds will be found, and two others (597, 600) 
supply graphs of the energy cost of marching under 
service conditions, demonstrating a carbon dioxide pro- 
duction of one-tenth cubic centimeter per kilogram 
body weight per horizontal meter. A more elaborate 
study of marching, bayonet drill, and other military 
activities was made by Cathcart and Orr (136), and a 
special study of the optimum rate of marching was re- 
potted by Cathcart, Lothian, and Greenwood (134), 
They found an optimum rate of about three miles per 
hour, while Benedict and Parmenter (80), in a recent 
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experiment, found the optimal walking speed to be the 
somewhat lower rate of 65 meters per minute. These 
workers also found sauntering to be very uneconomical, 
although the accumulated evidence from various 
studies indicates that the optimum walking speed is not 
a sharply defined point, particularly at the lower part 
of the curve. 

The studies in which the cost of carrying loads is 
made a part of the investigation of the energy cost of 
walking are of more interest industrially than most of 
the experiments we have been considering. Studer 
(548) , using the portable form of the Krogh apparatus, 
investigated the metabolism of resting, standing, and 
weight carrying. Another study of various factors af- 
fecting the metabolism of walking is the recent investi- 
gation by Easier (35). Atzler and Herbst (25) 
studied the economy of walking and of pushing and 
pulling loads, rather than studying the cost of carrying 
them, The optimum walking speed was found to be 
only 51.4 meters per minute. Bedale’s study of the 
energy consumption of carrying different loads (41) 
has been referred to in other connections. Atzler and 
Herbst (22) review the literature on this subject 
through 1927. 

Studies of the energy cost of transporting loads must 
be based upon one or another of several possible base 
lines, Bedale investigated basal, ordinary resting, and 
walking without load, finding the energy consumption 
in the order of 31, 38, and 46, respectively. She found 
the second of these best suited to her purposes. The 
study of the physiology of standing by Simonson (520) 
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should be of interest to experimenters planning re- 
search on similar questions. 

The physiology of running has been principally 
studied by Hill and his colleagues. "We have selected 
the work of Furusawa, Hill, and Parkinson (205) and 
Hill and Lupton (287) as examples. In the former 
article, page 50, it is stated that “the fatigue is due to 
the enormous rate of expenditure of energy in running 
at top speed. One subject .... was developing 
horsepower at his maximum velocity and liberating 
more than four grams of lactic acid per second in his 
muscles." 

The cost of walking, both in and out of training, has 
been reported upon by Waller and DeDecker (591), 
and the cost of walking on a slippery floor as compared 
with a smooth one has been investigated by Hietanen, 
Nikkinen, Nyyssola, and Sternberg (276). 

The Study of Irtdnsirial fVorkinff Condiliont, 
Metabolic studies which have actually been made in 
industry for the purpose of determining the physiologi- 
cal cost of work under diverse working conditions are 
extremely few in number. In previous sections we 
have noted various studies which have been made in 
relation to such factors as the optimum load, the 
optimum speed, the optimum handle-length of tools, 
the most suitable methods of pushing and pulling and 
performing various tasks, the effect of changing the 
height of the work bench, and other related factors per- 
taining in some degree to industrial work, Many of 
these have not been worked upon with any degree of 
thoroughness and there is no need of our adding more 
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than has already been sketched in preceding sections. 

It would seem that by adopting suitable physiologi- 
cal measures it should be possible to investigate a host 
of other industrial labor problems from the standpoint 
of metabolic cost and secure thereby information which 
would be unavailable through any other approach. 

Several metabolic studies have been made of the 
rest pause, and these will be given consideration here, 
but there still remain a variety of questions which have 
scarcely been touched by these methods. It should be 
possible, through respiratory exchange determination, 
not only to adjust the number, duration, and spacing of 
rest pauses, but also to learn something about the proper 
length of the working day and the proportionate energy 
cost of overtime. Even .the proportionate cost of night 
work and day work might be studied, as well as the 
physiological differences between the five-, six-, and 
seven-day working week. A little work has been done 
on the arrangement of material and the effect of special 
methods and equipment, but there are no metabolic 
studies of the effect of music and rhythmic sounds, of 
monotony and variety in tasks, or of the possible effect 
of differences in the aesthetic appeal of the workroom. 
One or two studies have been made of the effect of noise 
and distraction and of the effect of certain home condi- 
tions, such as loss of sleep, but more work needs to be 
done before practical conclusions can be reached. The 
effect of differences in illumination might be somewhat 
difficult to investigate through respiratory exchange 
techniques, and it is difficult to say whether these 
methods could be made to yield valuable information 
concerning the influence of differences in ventilation. 



476 


CBNETIC PSYCHOLOGY MONOGRAPHS 


One important reason for the meager nature of the 
experimental results on these questions will be found in 
the limitations residing in the respiratory technique it- 
self. It must be clear to any one who has pursued even 
a small portion of the readings recommended in this 
guide that ordinary routine respiratory exchange de- 
terminations are valueless in studies of mental work, 
and may be of negligible significance in studies of 
muscular work chiefly involving restricted muscle 
groups, or in which there is considerable muscular 
strain but relatively little activity. Many broad classes 
of industrial jobs and nearly all business occupations 
are thus excluded from investigation by these methods, 
although those industrial jobs which remain are still 
in the majority. 

In suggesting the energy cost concept as a substitute 
for the current concepts of industrial fatigue it must be 
borne in mind that reference is made to industrial tasks 
involving principally muscular labor, although this 
labor need not be of a heavy type. Work involving 
muscular strain or visual strain, or work done in 
cramped or unusual positions produces subjective 
fatigue effects out of proportion to the energy cost in- 
volved and may have serious physiological conse- 
quences as well, The subjective fatigue concept is still 
valuable, and it becomes more useful if we restrict its 
, use industrially to cases of this sort rather than use it 
as a blanket term covering several diverse and only 
partly related meanings. This will mean that such 
matters as visual fatigue and local muscular fatigue 
will need to be studied by introspective methods or by 
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such physiological tests as may be found suitable, or 
else reliance will need to be placed upon otherwise un- 
supported studies of the work curve taken in its 
broadest sense. It is possible that sufficient correlation 
will be found between general bodily tension and eye 
strain, for example, to establish measurements of the 
one as measures of the other, but there is not sufficient 
evidence to validate such a practice at the present time. 
Also it is possible that generalizations concerning 
working conditions which are found to hold for manual 
labor will be found by experience to be suitable in 
other instances, but this cannot be determined except 
upon a basis of experience. 

The Effect of Noisy Working Conditions. The 
present writer was originally attracted to the field of 
respiratory exchange determination in industry in 
planning an experiment to measure the physiological 
cost of noisy surroundings in the factory and work- 
room. He found the planning of such an experiment 
to be a more complex task than was anticipated, with 
the result that the experiment has not yet been per- 
formed, but he has accumulated a number of references 
on the subject which will be given consideration here. 

Only one serious effort has been made to measure the 
energy cost of industrial noise, an experiment by Laird, 
reported first by Dixon (164) and later by Laird (360). 
It was abstracted anonymously (11) and has been re- 
ported more recently in greater detail by Laird (360ffl) . 
The Douglas Bag Method was used and laborious ef- 
forts were made to control extraneous physiological 
factors. Elaborate apparatus was used in recording the 
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atnount of work being done on the typewriter and in 
standardizing the conditions of the experiment. In 
spite of the impressive description of the experimental 
controls employed and the precautions taken against 
possibility of error, the descriptions of the experiment 
do not reveal the most essential information of all, i.e., 
whether both respiratory gases were measured or 
whether reliance was placed on one of them alone. The 
Indications are that only carbon dioxide production 
was measured; using the potash side of the Haldane 
apparatus. In the absence of protocols or tabular data, 
it is impossible to tell whether this is the case, but, if 
true, it 'would greatly vitiate the conclusions drawn. 
Since Laird reports as much as a 40 per cent increase 
in metabolism during work under noisy conditions as 
compared with quiet it would seem necessary to have 
complete data before accepting his generalized results. 

The only other studies of the effect of noise are output 
studies such as that of Kornhauser’s (346), general 
observations on the desirability of sound-proofing as in 
the article by Hannam (245), generalized observations 
on the clinical and medical aspects of the problem as 
written by Glibeft (216), and arm-chair deductions 
concerning the physiological effects of noise such as 
those contributed by Hoogheimstra (296), 

Several articles have been written by Spooner, in 
' the hrst of which (534) he presented a classification 
of noise producers, which may have been useful at that 
date and which were copied without credit by Mowery 
(438). The only other material in Mowery’s article 
consisted of either commonplace or else inaccurate 
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statements of fact, and the nature of Spooner’s article 
may be inferred from the following quotation : “With- 
out attempting minutely to describe the ear, which has 
been called the second gateway of wisdom and is re- 
garded as the most complex of all parts of the human 
frame,” etc., etc. Spooner describes Low's audiometer 
in a later article (536), which is duplicated in another 
reference (535) . An article full of generalizations and 
pictures was written recently by Faulkner for Factory 
and this was abstracted anonymously (13) in the 
Monthly Labor Review. 

Morgan (432) has contributed a splendid laboratory 
investigation of the effect of distracting noises and is 
now engaged upon an electrocardiograph study which 
was given a preliminary report by Snook (528). Both 
of the methods used by Morgan are useful techniques 
in studying problems of this sort in the laboratory, but 
they are beyond the scope of a paper primarily con- 
cerned with respiratory exchange determinations in 
industry and so will not be treated here in detail. 

Rest Pauses. About the only metabolic work of in- 
dustrial importance which has not been noted at some 
length in earlier sections of the present study is the 
experimentation which has been done with respect to 
the rest pause. The influence of rest pauses has been 
studied in several experiments, although a great deal 
of work remains to be done in connection with this im- 
portant problem. 

A good modern summary of metabolic and other 
studies of the rest pause has been contributed to the 
literature by Graf (224). Simonson (516) reported 
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a lecture on fatigue and recovery after physical work 
in which he discussed the study of the metabolic effects 
of rest pauses, holidays, and vacations by means of de- 
termining lactic acid production and gaseous exchange, 
Strauss and Bandman (W7) speak favorably of meta- 
bolic studies of the effect of rest pauses, commending 
in particular that of Hill anti his colleagues in the re- 
covery period. Messerlc (426) reviewed the literature 
and reported his electrocardiograph experiment in 
work and rest pauses. Shepard (SlS) used loss of 
weight to discover effects of rest pauses in work, find- 
ing that rest pauses have more of an influence on such 
loss than does output. Many of the curves published 
by Waller have an important bearing on the physiology 
of the test pause, as he himself has stated and as has 
been pointed out by Page (458). In view of the fact 
that Waller’s experimental results in this connection 
have been subjected to criticism, it is interesting to note 
that Benedict and Cathcart (72] found a drop in ef- 
ficiency after the third hour of muscular exercise. 
These experimenters found that oxygen consumption 
increased proportionately more or less as Waller had 
reported a proportionate increase in carbon dioxide 
output, although the rise in oxygen consumption was 
not so pronounced. 

This concludes our catalogue of original investiga- 
tions of a metabolic nature which have a bearing on 
the adjustment of working conditions to the worker in 
industry. There is a vast field still untouched, although 
new experiments are constantly being reported, One 
who is interested in this literature should refer not only 
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to the physiological sources but should also investigate 
the engineering references from the time of Taylor 
(SS7) down through the modern handbooks such as 
that written by Dana (1S2), as well as material ap- 
pearing in the engineering journals. Similarly, access 
should be had to the psychological periodicals and to 
psychological texts and reference books such as those 
of Giese (214), Poffenberger (467), and Weber (609). 
There is not much material on metabolic studies in in- 
dustry in either of the German references mentioned, 
although such books often carry material which is at 
least of incidental interest. Poffenberger’s text carries 
more material of this type than is usually found in text- 
books on applied psychology. 

Highly pertinent material is often contained in re- 
ports of technical experiments in psychology, as the 
discussion of the fatigue concept in Crawley’s study 
(148) and the theoretical considerations on interpret- 
ing the work curve contributed by Farmer (189) and 
Spencer (533a) . 

Metabolic Tests of Physical Efficiency. There is 
one other possible use to which respiratory exchange 
determinations may be put in industry. Benedict (61) 
is inclining toward the belief that basal metabolic rate 
may be of significance as an indicator of the relative 
physical fitness, the level of vital activity, or the 
physical powers of an individual. A low basal meta- 
bolism seems to indicate a lowered physical fitness or 
vigor as compared with the vitality when a higher 
metabolic rate is found. 

A still more obvious direction of industrial research 
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would be the duplication in a modified routine form 
of experiments resembling those of Benedict and Cath- 
cart (72) on the efficiency of the human body as a 
machine. Indeed, this has already been suggested by 
Atzler (15) who uses respiratory tests as a means of 
selecting individuals for muscular work on the basis 
of their physical efficiency. Briggs (112, 113) speaks 
of the effect of athletic training in economizing pul- 
monary ventilation by utilizing a higher percentage 
of oxygen from the air, and excreting a higher per- 
centage of carbon dioxide. He states that the normal 
percentage may even be doubled in the case of well- 
trained subjects and indicates that this may be a pos- 
sible measure of physical fitness, 

These measures do not appear to have been widely 
adopted for there is almost no material upon them in 
the books by Dreyer (171) or Bovard and Cozens 
(.109), nor in the review by Martin (405) . There is a 
single quotation from Schneider in the book by Bovard 
/and Cozens: “Respiration tests are unpopular because 
, they require a highly trained gas analyst and because 
'■ rfiost of us alter our rate and depth of breathing when 
we are being watched.” 

The first part of Schneider’s assertion is, of course, 
untruei'and there is reason to believe that the difficulty 
, he mentions in the second part of the quotation is one 
...which can be overcome by using suitable precautions. 
'Sijhtieidet’s own physical test (500) is based upon the 
effect of exercise upon pulse rate and blood pressure. 
The rating it provides should be useful in industry 
although the test was originally developed for use with 
aviators. Scott (510) commends Schneider's point- 
scale rating and supplies additional data. 
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It is aaticipvited that this guide to the literature on Tespiratoty 
exchange measurement will be used by many who are relatively un- 
familiar with the sources of physiological, psychological, and medical 
literature, Such persons may need assistance in extending the refer- 
ences in the special field of their proposed inquiryi as well as in bring- 
ing their references down to date. For this reason we are repeating 
the titles of the general reference books of most value in a biblio- 
graphic way, as well as listing the periodical indices and abstract 
journals available in fields related to the subject matter of the 
various sections of this book. It is to be understood that the refer- 
ences in the bibliography are not complete or exhaustive in number, 
but it should be possible to amplify the bibliographic materials for 
any section with the aid of the sources given below. 

Fatigue 

Durig (176), about 200 references on industrial fatigue. 

Durig (178), nearly 500 titles on theory of fatigue and its meas- 
urement, 

Spaeth (533), several hundred titles on "The Problem of Fatigue." 

Collis and Greenwood (141), brief bibliographies on fatigue and 
related subjects, 

Frumerie (199)^ early literature on relation between carbon di- 
oxide output and fatigue due to static work; 41 titleSi 

International Labor Office (302), bibliography on visual fatigue 
and protection of eyesight, 

Bovard and Cozens (109), bibliography on physical tests; some 
useful references in relation to fatigue. 

General Physiology 

Bayliss (38), pp. 741-846. Titles classified by author but not by 
subject, 

Morse (435), general references, pp. 23-26. Also chapter bibli- 
ographies. 

Abstract of the Literature of Indnstrial Hygiene (626), Volume 
191 9-. (Published as part of the Journal of Hidusirial Hy- 
giene. ) 

American Journal of Physiology Index (627), Volume 1, 1898- 
1912; Volume 2, 1912-1922. 
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Arheiuphysiologlt^ Zeilschrift fSr dii Phythhgie det Menschtn 
bei Arbeit and Spart (628), Volume l^> l928"-» 

Respiratory Physiology 

Krogli (354), wtsll-orgaaizicd biblioemphy to 1916. 

Gesell (2J2), 14) titles on regulation of respiration. 

Knoll (344) I 42 titles on respiration in sports, 

Menus (422), biblioginphy on dyspnoea ami related topics. 

Basal Meiaboltsw 

Boothby and Sandiford (102), remarkably complete bibliojxraphy 
of about 700 titles, arranged chiefly by institution of origin. 

King (321), 300 titles, recent and valuable, 

Baothby and Snndiford (105), 28 titles, principnlly on surface 
area, 

Du Bols (172), numerous footnote references on all phases. 

Grafe (225), extensive bibliographies on every phase of human 
metabolism. 

Krauss (348), 114 titles on gaseous metabolism. 

Benedict (56), 15 references of a general nature. 

McCann (412), 304 titles on calorimetry in mctlicincr 
Murlin (442), 209 titles on mctpholism in infancy and duldhood. 
Talbott (555 )> 169 titles on basal metabolism in children. 

physiology of Muscular Mxercis^ 

Hill (281), extensive references up to 1924. 

Hill (284), bibliography continued from 1924 to 1927. 

^ Hill (283), selected bibliography of 31 titles on metabolism of 
work, 

Hill, Long, end Lupton (285), bibliography on muscle physiology. 
Hill and Lupton (286), references on muscular exercise and 
muscle physiology. 

Briggs (112), brief bibliography on physical ciiccrtion and respirn- 
t.ion, 

Barr, Himwich, and Green (34), good bibliography on blootl 
chemistry of muscular exercise. 

McCurdy (418), good bibliography on physiology of exercise, 
Bainbtldge (30), French, German, and English titles on general 
physiology of muscuUr work, 
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Walther (605), French bibliography on work physiology, 

Campbell, Douglas, and Hobson (121), references on the respira- 
tory effects of work. 

Cathcart (130), 77 references on muscle work and protein meta- 
bolism. 

Cathcart, Lothian, and Greenwood (134), bibliography and sum- 
mary of early work on walking and marching. 

Cathcart and Orr (136), bibliography on marching and related 
work, 


Miscellaneous Factors Affecting Metahalisin 

Miller (428), 150 titles in bibliography of general nature, 
Morgulls (433), 1000 references on fasting and related topics, 
Cathcart and Burnett (133), 33 titles on effect of diet, 
Sundstroem (551), 194 references on the effect of climate, 

Landis (362), 25 references on the effect of emotion. 

Zeigler and Levine (622), 20 titles on effect of emotion on meta- 
bolism. 


Mental TV ork 

Grafc (225), 23 titles on mental work, as well as good bibliogra- 
phy on emotional states and mental disorders. 

Day (157), good bibliography on mental work. 

Benedict and Carpenter (69), review of early literature with 
bibliography. 

Gillespie (215), good bibliography on influence of mental and mus- 
cular work on blood pressure and pulse rate, 

A 

Industrial Psychology and Management 

Rossi and Rossi (649), comprehensive bibliography on personnel 
administration, including titles on fatigue, working conditions, and 
related topics. 

. Cannons (635), all phases of industrial efficiency and factory man- 
agement, but nothing on encrg5' consumption because of early date. 

Beresford (629), lists only 9 books and about 40 periodical refer- 
ences on industrial psychology. 

Berg (630), selective bibliography of English titles on manage- 
ment. A few references on fatigue and other personnel problems. 

Society of Industrial Engineers (633), 23-page bibliography on 
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industrial ctigJnecniig flrtd rtirtnagemcnt ; published too cflrly to in- 
clude material on metabolic studies- 

International Labor Office (632) i unavailable to reviewer. 

General Ahsiracti, RevitiOf, ^nd Indexes 

Index Median* First Series (637), n monthly periodical pub- 
Ushed 1879-1899, Indexing general phyfliological nnd rncdical litcrn- 
ture.. 

Index Medicuu Second Scries (638), a monthly periodicnl pub- 
lished 1903-1920; essentially a continuation of Series One. 

Index Medicus^ Third Series (639), a quarterly index to medical 
and physiological literature, published 192LI927* Covers world 
literature on medical subjects In widest sense. 

Quarterly Cumulative Index (647), published quarterly 1916- 
1926. Not quite as comprehensive as the Index Median, Third 
Series, and having the further drawback that foreign titles appenr 
in translation rather thnn in original language. 

Quarterly Cumulative Index Mediens (648), began publication 
in 1927 as merger of Index Median and OnarUrly Cnmuhtive In- 
dex, Complete index to the world literature on mcdicAl, phyaiologi- 
cal, biological, and related subjects. 

Index Catalogue of the Library of the Surgcon-Gcnernrs Office, 
U. S. Army (636), began publicntion in 1880; now in the Third 
Series. Third Scries, when complete, will constitute pnicticnlly com- 
plete world index* 

B^richie iiber die gesatuU Physiologie ^nd expcrimentelle Phar- 
inakfiiogh (631), Volume 1^, 1920-. 

' Jahresbericht uber die gesnmte P/iyjfo/o^/> und experinienielh 
Pharinttkologie mil ^clsfdndiger liibfiographie {640), Annual, Vol- 
ume In 1920-. , 

‘ ioi/rnuf of the National Instiliue of Induurinl Psychology (641), 
carries reviews and abstracts. Volume 1-, 1922-1923“ 

, Phydalogkal Abstracts (642), a monthly pctiodicab starting pub- 
licatioii I'ti 1916, 

PhyshlogUal RevUws (643), a quarterly publication carrying 
rcvieVfrs of physiological subjects, starting publication in 1921. 

Psi/ch'olQdlcdl Index (646), an annual index to world psychologi- 
cal literature, starting publicnrion 1894. 
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Psychological Abstracts (6+4) , monthly journal carrying abstracts 
of selected world literature, beginning publication 1927, 

Psychological Bulletin (645), carries abstracts and reviews of psy- 
chological literature, starting publication 1904. 

Bulletin of the Public Affairs Information Service (634), carries 
an index to literature on fatigue, working conditions, and related 
topics, Volume 1—, 1915-. 

Abstract of the Literature of Industrial Hygiene (626), Vol, 1-, 
1919-. Appears as a part of The Journal of Industrial HygienCj 
Harvard School of Public Health. 
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LA MESURB UE L’fiNERGlE HUMAINE EMPLOYEE DANS 
L’lNDUSTRIK 

GUtDB GfeKfiRM- A CB QU'ON A fiCRlT A Cfl BUJBT 

CcLic mnnoKraphic » commo but la prfscniailon aux lectcnra amdricalna 
dc ce fiu'on a dcrii »ur fa meaurc t(c remploi dc I'^nergic, Ricn <|ue dee 
bommeK dc »cicncc allcmaiidj cic/il /nil bcaucoun d'4(udes du m^ia- 
holifme du travail IcEqucIlca posBidcnt uu imArfit inJuBtriel, ct ties invcBtU 
garcurit frmi^ah ci anfflals at Boienl acrvia anaez bicn dca dAicrminallona 
dc TAchange rcspiratoirc dana lea Aludes industriclfcSi on n'a fait dc telica 
cxpArjtnces flux Einii-Unij que il'une fuyon irAs li/nUAe. IJ n'existe nucim 
inaniicl nAnAral cn anitlaU aur lea mAibodcs, lea apparella^ et lea rAaultata 
dnriB Ee dump dc Ift mesure dc I'Aucrgic humainc employee dans 1c travail, 
dc for(c qoe I'expAHmcncaicgn dc cc genre au laboratocrc au dans ffndus- 
iric cBl Impossible pour lea psychologucs industrleh ct Ics IngAnicura In- 
d'ualrlch h rnolns qu'iU ae connoissent LrAs bicn ea physiologic ou qu’ile 
snehent lire les manuela ademands A ce sujet. 

On ^11 AcrU ccMc monographic seulcmenr pour servir dc guide h ce qu*on 
a Acrii h cc sujei ct Vow ne pr£ sente aucuna rAsultaia dc rcxpArlmentation 
orlginale* PuisquMI n'y a paa dc concUisions expdrimcntales A rAsumer, on 
ne prdsentc ci-dessaus qu'une AnumArAiion dei divers aujets et titres, lesquels 
nionircni PorgAnisation ci lea mniUrcs dlscutAcs, 

Femhiiofti Phyjioloffttjurj, ThAorie dc la DAterminalion do I'Echange 
respiraioire; La Paiigue et la Pent dc TEnerglei Id Mesure de la Fatigue; 
Jr Physioiogle gAnAra/c; Ic MAtabollsme fondamental; PEchange mAto- 
bolique gAnAral et le MAtnbolisme de l^Energlc* La Physiologle respira- 
ioire ei la Cliimie du Sang clans iqa rapports au Travail muaculalre; la 
Bin-Cfdmio et fa Dynamique dc FAction musculnfre; la Physiologic du 
Travnll mijacidoire; le Qiioilem resplrotolre; la CompDtafJon dea Calories; 
les Fneteura qul Inllucnt sur le MAtnboUsme; Ics Facteurs phyaiologiques 
sujets nu corUr61c cxpArimcntnl ; le Travail mental; les Focteura du Milieu 
qul Influent sur Ic MAtabolfsme; RAfArcnccs gAnArnles aur les facteurs qui 
inllucnt sur le MAiabolUmc. 

/Ip/farcih ei il/rMof/ri. MomieU gAnAroux sur lea Appareila et les MAtho- 
des; Indfccs foiluiu et non-quamUaiifa dc la Vltcase du MAtnbolisme; la 
CalorimAtric dircctc; la CalorimAtrie indircctc; les MAihodes de Circuit 
ouvert, type gaaomAire; Ics MAthodca de Circuit ouvert, type Sne Douglas; 
Critique cl DAfense dc Id MAihodc Waller; Analyse des Gaz por la 
MAthode Waller; Aiitrcs MAdiodca SImpIf/iAej de I’ Analyse des Gaz; 
Aiilrcs MAlhodcs cle Circuit ouvert de la DAterminalion de la Vitesse du 
MAtabolisme ; la VcntilDtion pulmonoirc totale comme indice de lEncrgie 
cmployAc dans le Travail lAgcr; MAthodcs dc Circuit fermA pour dAter- 
»«lner In consommntlon dc I’oxygAiie ct la quantitA du gaz carbonique; 
MAlhodes graphiques pour clAtcrmiiier TEchange respiratoire ; Appareils de 
Rcspiialion pour lea Experiences dc I'Echange respiratoire; Appareila qc^ 
ccasoircB pour Ics ExpAn’euccg dc I’Ecfiafiffe respiratoire. 

A Pplicailoni et Hesuiints, Les Kinplois dc la DAtcrmlnntion de I'Echange 
respiraioire dans riiuluHiric; Rtiidcs dnns Ic laboratoric des Composanta 
chi Trnvnil induHlrlel; I'Kncrgie cmployAc dans Ics occupations industrieiks; 
I'Knergle cmployAc qimnd on mnrchc A pied; J'Etmle des Conditions du 
Travail indusiriid; I'liflct du bruit dans Ic Trnvnil industrlcl; PArlodea de 
Uepoa; Tesla mAUholiquca dc I'Enicicnce physique. 

Fade 
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DIE MESSUNG DES NtENSCHUCHEN ENERGIEVERBRAUCHS Ih/ 
DER INDUSTRIE: BIN ALLOEMEINER WEOWfilSER 
J50R LiTERATUtt 
(Refcrai) 

Eg hi der ZU\ dicker Monograplile, die Liieratur Cibor die McjiiunK 
EncrrgjeyerbrauchB lOr Amerikanltchc Lc^er darxubifiien. Obwohl tlcuidch^; 
Eorficber vUle Utii^r’sudiungen Qber dtn metttbolltcbcn AuifuuMrh w^hrend 
dcr Arbeit durcheefohrt hnbon die von induMrlellor Itilerene jifnd, timl 
obwohl ftamBjiscbc uftd tngUwhc Arhtitev cinon tlemlitH iwciieft Gt* 
hrauch von BoaliniTniingcn dci resplratorlechen HauvhaUo (reapirAtory cj- 
change) in Induairiellcn Untcrpuchungcti gemacht habeOi aln<l Aolcbe tinier* 
suchungen in den Ver^inigien Scaaien ftur In sehr be^cbrenkiem Masit 
durchgeft^iirt ^vorden. 

Ea glbt auf EngUacK ketn allgemelnaa Handbuch (ibcf Mcthoden^ Vor 
rldhtungen odcr Bef\inde im Bereith der Mcaiung iler Energ]cko$len der 
menschrichen Arbeit. Forichung diwer Art Im Laboraiorium odcr in eJet' 
IndiiBirle blelbt irtduHtrleiUci Rsycbologen IngenUurcn ulao versa gi, e* 
aei denn dais ale grOndllch in der Phyilologie gcfehuU alnd odor die 
deutachen HandbQcher Bber dleae Sschc leften kStlhen. 

Die gegcnwartigfl Mdnojgrftphlo aoJl b1o»i uli VVegfdhror zur Liuraiur 
dlenen/ bnd cfl werdert durln kelne Bofiinde oub eigenen Verauchen gegebed. 
Da oa kelne txnerimenteBen Befunde zu reforleien glbi, vro)Un ^nr umen 
bloaa die yeracMcdetion Thenien und OberachHfu Ttopi^ and beading*) 
elnlragen. Klerdurch v^ird der Plan und der Oegonslsnd drr Untersuch* 
■ung dargele)D(t werdetii 

I PhyuQioffUche Orujtftltiffett. Thcorle der Bjosllmmung (lea reaplrafor" 
Isohen AuaU^ aches (respiratory exebangt) \ DU EvitvCidung und d«t Enetf'- 
glevcrbrauch ; Die Mqsaung dor Erinddung; Allgerrseliie rhyslplogle; Per 
OrundauBiniiach (bnsol motaboltam) ; AllgemeLner meiAbolIsfiher AuifAuscb 
und Energieauauusch ; ALmungephyslologio und Blurchlmie Im Eusainmeri*' 
hong mlt MuskeUrbelt: bioctilntle und DyiiBmIk dor MuskehflUgkeIn 
PhyaioloKle der . Muikelnrhcil i AttnmgpquolUm (reipUaiory quotient)^ 

men; Pli>.ilol::g:!iche Elnwh kiin|refi die aich cxperlmeotcl) konlrollercn 
, laasen; Dk* Arbeb; Einwitkuri't^n dei* Utngebung aul den Meii^‘ 

i bollarkiua; AIlKeniflnc Lberniur nhrr Riiiwirkungen nuf den Mefabolismus* 

\ '' VofrUhiun^tn und MtihQdtn, Allgcmcine lUndbiicher 

dber Klnrichtungcn und Meeboden; BellDdlige und nicht^ciueniilAnvc ll^- 
dUien der Schnclllgiccic des MetabnUamiia (metabolic rate) ; Die dlrekto 
.KaUrlmeUiej Indirektc KaUrlmclrle; Metbodon mU olTenem SiromkrcISi 
yqm Typua dcs Douglasfichen Saks (DougUi bng) ; Die Wallerache VerclP- 
,,iRchung dcr '^Douglaa bag** Melbode; Krllfque und Vcrleldigung der 
r .Methode Waller; Andcrc Metbodon mifc Gobrauch von oflcricm Slrornkrc's 
|iur Beatimmung der Schnclllgkoli ties MetabolUmus; Die gesamm^c 
IlUngenventUatlon aU Index dee EnergUvctbriiuchu bcl UUhlci Albeit i 
‘ iM^tfioden mit Gebrauch von gcBchlosaericm Sirpmkreh xur DcBtiitimiing dc* 
S^OenaBffvcrbrauchcs und dcS Enrages nn KohUtofTHaiiere; Gropbiicbc 
bei dcr Bcatlmmung doa AtmungaauaUuschoa (rcapiriilory cx- 
<Hango)d A.lmungsapparat fUr VeraudiQ on Ut?m rcapiratorlsciicn Auaiauichl 
filer Verauche an dem rcaplmtarlachen Austnusch. 
^nweft'dupfiftt und B^fundc* Nvttcen dcr Dcsthnmung dcs respirutori" 
achen AuHtiiuachea In der Indoatrio; tliuoraucliungon Im Lnborniorliirn tin 
den Bestandteilbn der UdustrUilen Arbclii Enorglo verb much hcl Indu®* 
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trlcllcn J3erufen; Energicverbrauch helm Ochen; Pic Umer^uchung 
duatricllcr ArbcUdbcclingurtgen; Dio Binwirlcung gernGochvoIIer Arbciis^ 
bcdingungen; Rithcpautien; MoUboIiscUo Prflfungcn dcr IciSrpcrlichfn 
Lcintungiifilhlgkeiit 
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